sherpa_ai.connectors package#
Overview#
The connectors package provides interfaces for Sherpa AI to connect with external systems
and databases. It includes specialized connectors for vector stores and other data persistence
mechanisms required for retrieval-augmented generation and knowledge storage.
Key Components
Base: Abstract interface for connector implementations
ChromaVectorStore: Implementation for the Chroma vector database
VectorStores: Generic interfaces for vector database interactions
Example Usage#
from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore
# Initialize a vector store
vector_store = ChromaVectorStore(
collection_name="documents",
embedding_function=embedding_fn
)
# Store documents
documents = [
"Sherpa AI is a framework for building intelligent agents.",
"Vector databases store vector embeddings for semantic search."
]
vector_store.add_texts(documents, metadatas=[{"source": "docs"} for _ in documents])
# Retrieve similar documents
results = vector_store.similarity_search("How do I build an agent?", k=2)
print(results)
Submodules#
Module |
Description |
|---|---|
Abstract base classes defining the connector interface. |
|
|
Implementation for the Chroma vector database with document storage. |
Generic interfaces and utilities for vector database interactions. |
sherpa_ai.connectors.base module#
- class sherpa_ai.connectors.base.BaseVectorDB(**data)[source]#
Bases:
ABC,BaseModelAbstract base class for vector database connectors with Pydantic validation.
This class defines the interface that all vector database connectors must implement, providing methods for similarity search operations with automatic data validation.
- db#
The underlying database connection or client.
Example
>>> from sherpa_ai.connectors.base import BaseVectorDB >>> from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore >>> # ChromaVectorStore implements BaseVectorDB >>> vector_db = ChromaVectorStore(db=some_db) >>> results = vector_db.similarity_search("query", number_of_results=5)
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- db: Any#
- abstractmethod similarity_search(query, number_of_results, k, session_id=None)[source]#
Perform a similarity search in the vector database.
This method searches for documents that are semantically similar to the query. All parameters are automatically validated by Pydantic.
- Parameters:
query (str) – The search query.
number_of_results (int) – The number of results to return.
k (int) – The number of nearest neighbors to consider.
session_id (Optional[str]) – Session ID to filter results. Defaults to None.
- Returns:
A list of documents that match the query.
- Return type:
List[Document]
Example
>>> from sherpa_ai.connectors.base import BaseVectorDB >>> from sherpa_ai.connectors.chroma_vector_store import ChromaVectorStore >>> vector_db = ChromaVectorStore(db=some_db) >>> results = vector_db.similarity_search("What is machine learning?", number_of_results=5) >>> for doc in results: ... print(doc.page_content[:100])
sherpa_ai.connectors.chroma_vector_store module#
sherpa_ai.connectors.vectorstores module#
- class sherpa_ai.connectors.vectorstores.ConversationStore(namespace, db, embeddings, text_key)[source]#
Bases:
VectorStoreA vector store for storing and retrieving conversation data.
This class provides methods to store conversation data in a vector database and retrieve similar conversations based on queries.
- db#
The underlying database connection.
- namespace#
The namespace for the vector store.
- Type:
str
- embeddings_func#
The embedding function to use.
- text_key#
The key used to store the text in metadata.
- Type:
str
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index") >>> store.add_text("This is a conversation", {"user": "user1"}) >>> results = store.similarity_search("conversation", top_k=5)
- classmethod from_index(namespace, openai_api_key, index_name, text_key='text')[source]#
Create a ConversationStore from a Pinecone index.
This method initializes a Pinecone client and creates a ConversationStore instance connected to the specified index.
- Parameters:
namespace (str) – The namespace for the vector store.
openai_api_key (str) – The OpenAI API key.
index_name (str) – The name of the Pinecone index.
text_key (str, optional) – The key used to store the text in metadata. Defaults to “text”.
- Returns:
A new ConversationStore instance.
- Return type:
- Raises:
ImportError – If the pinecone-client package is not installed.
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index")
- add_text(text, metadata={})[source]#
Add a single text to the vector store.
This method embeds the text, adds it to the database with the provided metadata, and returns the ID of the added text.
- Parameters:
text (str) – The text to add.
metadata (dict, optional) – Metadata to associate with the text. Defaults to {}.
- Returns:
The ID of the added text.
- Return type:
str
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index") >>> id = store.add_text("This is a conversation", {"user": "user1"}) >>> print(id) '123e4567-e89b-12d3-a456-426614174000'
- property embeddings: Embeddings | None#
Access the query embedding object if available.
- add_texts(texts, metadatas)[source]#
Add multiple texts to the vector store.
This method adds each text with its corresponding metadata to the vector store.
- Parameters:
texts (Iterable[str]) – The texts to add.
metadatas (List[dict]) – The metadata for each text.
- Return type:
List[str]
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index") >>> texts = ["Text 1", "Text 2"] >>> metadatas = [{"user": "user1"}, {"user": "user2"}] >>> store.add_texts(texts, metadatas)
- similarity_search(text, top_k=5, filter=None, threshold=0.7)[source]#
Perform a similarity search in the vector store.
This method searches for texts that are semantically similar to the query.
- Parameters:
text (str) – The search query.
top_k (int, optional) – The number of results to return. Defaults to 5.
filter (Optional[dict], optional) – Filter criteria for the search. Defaults to None.
threshold (float, optional) – The similarity threshold. Defaults to 0.7.
- Returns:
A list of documents that match the query.
- Return type:
list[Document]
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> store = ConversationStore.from_index("my_namespace", "api_key", "my_index") >>> results = store.similarity_search("What is machine learning?", top_k=5) >>> for doc in results: ... print(doc.page_content[:100])
- classmethod delete(namespace, index_name)[source]#
Delete all vectors in a namespace.
This method deletes all vectors in the specified namespace of the Pinecone index.
- Parameters:
namespace (str) – The namespace to delete.
index_name (str) – The name of the Pinecone index.
- Returns:
The result of the delete operation.
- Raises:
ImportError – If the pinecone-client package is not installed.
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> ConversationStore.delete("my_namespace", "my_index")
- classmethod get_vector_retrieval(namespace, openai_api_key, index_name, search_type='similarity', search_kwargs={})[source]#
Create a vector store retriever.
This method creates a ConversationStore and returns a VectorStoreRetriever for it.
- Parameters:
namespace (str) – The namespace for the vector store.
openai_api_key (str) – The OpenAI API key.
index_name (str) – The name of the Pinecone index.
search_type (str, optional) – The type of search to perform. Defaults to “similarity”.
search_kwargs (dict, optional) – Additional keyword arguments for the search. Defaults to {}.
- Returns:
A retriever for the vector store.
- Return type:
VectorStoreRetriever
Example
>>> from sherpa_ai.connectors.vectorstores import ConversationStore >>> retriever = ConversationStore.get_vector_retrieval("my_namespace", "api_key", "my_index") >>> results = retriever.get_relevant_documents("What is machine learning?")
- classmethod from_texts(texts, embedding, metadatas)[source]#
Create a ConversationStore from a list of texts.
This method is not implemented for ConversationStore.
- Parameters:
texts (List[str]) – The texts to add.
embedding (Embeddings) – The embedding function to use.
metadatas (list[dict]) – The metadata for each text.
- Raises:
NotImplementedError – This method is not implemented for ConversationStore.
- class sherpa_ai.connectors.vectorstores.LocalChromaStore(*args, **kwargs)[source]#
Bases:
objectA local Chroma-based vector store.
This class extends the Chroma vector store to provide additional functionality for working with local files.
Example
>>> from sherpa_ai.connectors.vectorstores import LocalChromaStore >>> store = LocalChromaStore.from_folder("path/to/files", "api_key") >>> results = store.similarity_search("query", k=5)
- classmethod from_folder(file_path, openai_api_key, index_name='chroma')[source]#
Create a Chroma DB from a folder of files.
This method creates a ChromaDB from a folder of files, currently supporting PDFs and markdown files.
- Parameters:
file_path (str) – Path to the folder containing files.
openai_api_key (str) – The OpenAI API key.
index_name (str, optional) – Name of the index. Defaults to “chroma”.
- Returns:
A new LocalChromaStore instance.
- Return type:
Example
>>> from sherpa_ai.connectors.vectorstores import LocalChromaStore >>> store = LocalChromaStore.from_folder("path/to/files", "api_key") >>> results = store.similarity_search("query", k=5)
- sherpa_ai.connectors.vectorstores.configure_chroma(host, port, index_name, openai_api_key)[source]#
Configure a ChromaDB instance.
This function creates a ChromaDB instance connected to a remote server.
- Parameters:
host (str) – The host of the ChromaDB server.
port (int) – The port of the ChromaDB server.
index_name (str) – The name of the index.
openai_api_key (str) – The OpenAI API key.
- Returns:
A configured ChromaDB instance.
- Return type:
Chroma
- Raises:
ImportError – If the chromadb package is not installed.
Example
>>> from sherpa_ai.connectors.vectorstores import configure_chroma >>> chroma = configure_chroma("localhost", 8000, "my_index", "api_key") >>> results = chroma.similarity_search("query", k=5)
- sherpa_ai.connectors.vectorstores.get_vectordb()[source]#
Get a vector database retriever based on configuration.
This function returns a vector database retriever based on the configuration in the config module. It supports Pinecone, Chroma, and local ChromaDB.
- Returns:
A retriever for the vector store.
- Return type:
VectorStoreRetriever
Example
>>> from sherpa_ai.connectors.vectorstores import get_vectordb >>> retriever = get_vectordb() >>> results = retriever.get_relevant_documents("What is machine learning?")