sherpa_ai.connectors package#

Overview#

The connectors package provides interfaces for Sherpa AI to connect with external systems and databases. It includes specialized connectors for vector stores and other data persistence mechanisms required for retrieval-augmented generation and knowledge storage.

Key Components

  • Base: Abstract interface for connector implementations

  • ChromaVectorStore: Implementation for the Chroma vector database

  • VectorStores: Generic interfaces for vector database interactions

Example Usage#

from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore

# Initialize a vector store
vector_store = ChromaVectorStore(
    collection_name="documents",
    embedding_function=embedding_fn
)

# Store documents
documents = [
    "Sherpa AI is a framework for building intelligent agents.",
    "Vector databases store vector embeddings for semantic search."
]
vector_store.add_texts(documents, metadatas=[{"source": "docs"} for _ in documents])

# Retrieve similar documents
results = vector_store.similarity_search("How do I build an agent?", k=2)
print(results)

Submodules#

Module

Description

sherpa_ai.connectors.base

Abstract base classes defining the connector interface.

sherpa_ai.connectors.chroma_vector_store

Implementation for the Chroma vector database with document storage.

sherpa_ai.connectors.vectorstores

Generic interfaces and utilities for vector database interactions.

sherpa_ai.connectors.base module#

class sherpa_ai.connectors.base.BaseVectorDB(**data)[source]#

Bases: ABC, BaseModel

Abstract base class for vector database connectors with Pydantic validation.

This class defines the interface that all vector database connectors must implement, providing methods for similarity search operations with automatic data validation.

db#

The underlying database connection or client.

Example

>>> from sherpa_ai.connectors.base import BaseVectorDB
>>> from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore
>>> # ChromaVectorStore implements BaseVectorDB
>>> vector_db = ChromaVectorStore(db=some_db)
>>> results = vector_db.similarity_search("query", number_of_results=5)
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

db: Any#

Perform a similarity search in the vector database.

This method searches for documents that are semantically similar to the query. All parameters are automatically validated by Pydantic.

Parameters:
  • query (str) – The search query.

  • number_of_results (int) – The number of results to return.

  • k (int) – The number of nearest neighbors to consider.

  • session_id (Optional[str]) – Session ID to filter results. Defaults to None.

Returns:

A list of documents that match the query.

Return type:

List[Document]

Example

>>> from sherpa_ai.connectors.base import BaseVectorDB
>>> from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore
>>> vector_db = ChromaVectorStore(db=some_db)
>>> results = vector_db.similarity_search("What is machine learning?", number_of_results=5)
>>> for doc in results:
...     print(doc.page_content[:100])

sherpa_ai.connectors.chroma_vector_store module#

sherpa_ai.connectors.vectorstores module#

class sherpa_ai.connectors.vectorstores.ConversationStore(namespace, db, embeddings, text_key)[source]#

Bases: VectorStore

A vector store for storing and retrieving conversation data.

This class provides methods to store conversation data in a vector database and retrieve similar conversations based on queries.

db#

The underlying database connection.

namespace#

The namespace for the vector store.

Type:

str

embeddings_func#

The embedding function to use.

text_key#

The key used to store the text in metadata.

Type:

str

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
>>> store.add_text("This is a conversation", {"user": "user1"})
>>> results = store.similarity_search("conversation", top_k=5)
classmethod from_index(namespace, openai_api_key, index_name, text_key='text')[source]#

Create a ConversationStore from a Pinecone index.

This method initializes a Pinecone client and creates a ConversationStore instance connected to the specified index.

Parameters:
  • namespace (str) – The namespace for the vector store.

  • openai_api_key (str) – The OpenAI API key.

  • index_name (str) – The name of the Pinecone index.

  • text_key (str, optional) – The key used to store the text in metadata. Defaults to “text”.

Returns:

A new ConversationStore instance.

Return type:

ConversationStore

Raises:

ImportError – If the pinecone-client package is not installed.

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
add_text(text, metadata={})[source]#

Add a single text to the vector store.

This method embeds the text, adds it to the database with the provided metadata, and returns the ID of the added text.

Parameters:
  • text (str) – The text to add.

  • metadata (dict, optional) – Metadata to associate with the text. Defaults to {}.

Returns:

The ID of the added text.

Return type:

str

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
>>> id = store.add_text("This is a conversation", {"user": "user1"})
>>> print(id)
'123e4567-e89b-12d3-a456-426614174000'
property embeddings: Embeddings | None#

Access the query embedding object if available.

add_texts(texts, metadatas)[source]#

Add multiple texts to the vector store.

This method adds each text with its corresponding metadata to the vector store.

Parameters:
  • texts (Iterable[str]) – The texts to add.

  • metadatas (List[dict]) – The metadata for each text.

Return type:

List[str]

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
>>> texts = ["Text 1", "Text 2"]
>>> metadatas = [{"user": "user1"}, {"user": "user2"}]
>>> store.add_texts(texts, metadatas)

Perform a similarity search in the vector store.

This method searches for texts that are semantically similar to the query.

Parameters:
  • text (str) – The search query.

  • top_k (int, optional) – The number of results to return. Defaults to 5.

  • filter (Optional[dict], optional) – Filter criteria for the search. Defaults to None.

  • threshold (float, optional) – The similarity threshold. Defaults to 0.7.

Returns:

A list of documents that match the query.

Return type:

list[Document]

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
>>> results = store.similarity_search("What is machine learning?", top_k=5)
>>> for doc in results:
...     print(doc.page_content[:100])
classmethod delete(namespace, index_name)[source]#

Delete all vectors in a namespace.

This method deletes all vectors in the specified namespace of the Pinecone index.

Parameters:
  • namespace (str) – The namespace to delete.

  • index_name (str) – The name of the Pinecone index.

Returns:

The result of the delete operation.

Raises:

ImportError – If the pinecone-client package is not installed.

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> ConversationStore.delete("my_namespace", "my_index")
classmethod get_vector_retrieval(namespace, openai_api_key, index_name, search_type='similarity', search_kwargs={})[source]#

Create a vector store retriever.

This method creates a ConversationStore and returns a VectorStoreRetriever for it.

Parameters:
  • namespace (str) – The namespace for the vector store.

  • openai_api_key (str) – The OpenAI API key.

  • index_name (str) – The name of the Pinecone index.

  • search_type (str, optional) – The type of search to perform. Defaults to “similarity”.

  • search_kwargs (dict, optional) – Additional keyword arguments for the search. Defaults to {}.

Returns:

A retriever for the vector store.

Return type:

VectorStoreRetriever

Example

>>> from sherpa_ai.connectors.vectorstores import ConversationStore
>>> retriever = ConversationStore.get_vector_retrieval("my_namespace", "api_key", "my_index")
>>> results = retriever.get_relevant_documents("What is machine learning?")
classmethod from_texts(texts, embedding, metadatas)[source]#

Create a ConversationStore from a list of texts.

This method is not implemented for ConversationStore.

Parameters:
  • texts (List[str]) – The texts to add.

  • embedding (Embeddings) – The embedding function to use.

  • metadatas (list[dict]) – The metadata for each text.

Raises:

NotImplementedError – This method is not implemented for ConversationStore.

class sherpa_ai.connectors.vectorstores.LocalChromaStore(*args, **kwargs)[source]#

Bases: object

A local Chroma-based vector store.

This class extends the Chroma vector store to provide additional functionality for working with local files.

Example

>>> from sherpa_ai.connectors.vectorstores import LocalChromaStore
>>> store = LocalChromaStore.from_folder("path/to/files", "api_key")
>>> results = store.similarity_search("query", k=5)
classmethod from_folder(file_path, openai_api_key, index_name='chroma')[source]#

Create a Chroma DB from a folder of files.

This method creates a ChromaDB from a folder of files, currently supporting PDFs and markdown files.

Parameters:
  • file_path (str) – Path to the folder containing files.

  • openai_api_key (str) – The OpenAI API key.

  • index_name (str, optional) – Name of the index. Defaults to “chroma”.

Returns:

A new LocalChromaStore instance.

Return type:

LocalChromaStore

Example

>>> from sherpa_ai.connectors.vectorstores import LocalChromaStore
>>> store = LocalChromaStore.from_folder("path/to/files", "api_key")
>>> results = store.similarity_search("query", k=5)
sherpa_ai.connectors.vectorstores.configure_chroma(host, port, index_name, openai_api_key)[source]#

Configure a ChromaDB instance.

This function creates a ChromaDB instance connected to a remote server.

Parameters:
  • host (str) – The host of the ChromaDB server.

  • port (int) – The port of the ChromaDB server.

  • index_name (str) – The name of the index.

  • openai_api_key (str) – The OpenAI API key.

Returns:

A configured ChromaDB instance.

Return type:

Chroma

Raises:

ImportError – If the chromadb package is not installed.

Example

>>> from sherpa_ai.connectors.vectorstores import configure_chroma
>>> chroma = configure_chroma("localhost", 8000, "my_index", "api_key")
>>> results = chroma.similarity_search("query", k=5)
sherpa_ai.connectors.vectorstores.get_vectordb()[source]#

Get a vector database retriever based on configuration.

This function returns a vector database retriever based on the configuration in the config module. It supports Pinecone, Chroma, and local ChromaDB.

Returns:

A retriever for the vector store.

Return type:

VectorStoreRetriever

Example

>>> from sherpa_ai.connectors.vectorstores import get_vectordb
>>> retriever = get_vectordb()
>>> results = retriever.get_relevant_documents("What is machine learning?")

Module contents#