sherpa_ai.output_parsers package#
Overview#
The output_parsers package provides tools for validating, formatting, and transforming model outputs in Sherpa AI. These parsers ensure that responses meet specific criteria and formats before being presented to users.
Key Components
Citation Validation: Ensures proper citation formatting and accuracy
Number Validation: Verifies numerical responses for correctness
Link Parsing: Extracts and validates hyperlinks from responses
Self-Consistency: Implements self-consistency improvement for complex Pytandic outputs
Format Conversion: Converts between different formats (e.g., Markdown to Slack)
Example Usage#
from sherpa_ai.output_parsers.citation_validation import CitationValidator
from sherpa_ai.output_parsers.number_validation import NumberValidator
# Validate citations in a response
citation_validator = CitationValidator()
citation_result = citation_validator.validate(
"According to Smith et al. (2023), AI has made significant progress."
)
print(f"Citation valid: {citation_result.is_valid}")
# Validate numerical answers
number_validator = NumberValidator()
number_result = number_validator.validate("The answer is 42.5 meters.")
print(f"Extracted number: {number_result.validated_output}")
Submodules#
Module |
Description |
|---|---|
Provides abstract base classes for all output parsers. |
|
Implements validation for proper citation formatting and accuracy. |
|
Contains tools for extracting and validating hyperlinks in responses. |
|
Provides conversion from Markdown to Slack message formatting. |
|
Implements validation for numerical answers and calculations. |
|
Contains the ValidationResult class for representing validation outcomes. |
|
Implements self-consistency improvement for complex Pydantic outputs. |
sherpa_ai.output_parsers.base module#
Base classes for output parsing and processing in Sherpa AI.
This module provides abstract base classes for output parsing and processing. It defines the core interfaces that all parsers and processors must implement, ensuring consistent behavior across different implementations.
- class sherpa_ai.output_parsers.base.BaseOutputParser[source]#
Bases:
ABCAbstract base class for output parsers.
This class defines the interface that all output parsers must implement. Output parsers are responsible for transforming raw text output into a structured or modified format.
Example
>>> class MyParser(BaseOutputParser): ... def parse_output(self, text: str) -> str: ... return text.upper() >>> parser = MyParser() >>> result = parser.parse_output("hello") >>> print(result) 'HELLO'
- abstractmethod parse_output(**kwargs)[source]#
Abstract method to be implemented by subclasses for parsing output text.
- Parameters:
text (str) – The input text to be parsed.
**kwargs – Additional arguments for parsing.
- Returns:
The parsed output text.
- Return type:
str
Example
>>> parser = MyParser() >>> result = parser.parse_output("hello world") >>> print(result) 'HELLO WORLD'
- class sherpa_ai.output_parsers.base.BaseOutputProcessor[source]#
Bases:
ABCAbstract base class for output processors.
This class defines the interface that all output processors must implement. Output processors validate and transform text output, tracking validation failures and providing detailed feedback.
- count#
Number of failed validations since last reset.
- Type:
int
Example
>>> class MyProcessor(BaseOutputProcessor): ... def process_output(self, text: str) -> ValidationResult: ... valid = len(text) > 5 ... return ValidationResult(valid, text, "Length check") >>> processor = MyProcessor() >>> result = processor("hello world") >>> print(result.valid) True
- count: int = 0#
- reset_state()[source]#
Reset the validation failure counter.
This method resets the count of failed validations back to zero.
Example
>>> processor = MyProcessor() >>> processor.count = 5 >>> processor.reset_state() >>> print(processor.count) 0
- abstractmethod process_output(text, **kwargs)[source]#
Process and validate the input text.
- Parameters:
text (str) – The input text to be processed.
**kwargs – Additional arguments for processing.
- Returns:
- Result containing validity status, processed text,
and optional feedback.
- Return type:
Example
>>> processor = MyProcessor() >>> result = processor.process_output("hello world") >>> print(result.valid) True >>> print(result.feedback) 'Length check'
sherpa_ai.output_parsers.citation_validation module#
Citation validation and addition module for Sherpa AI.
This module provides functionality for validating and adding citations to text. It defines the CitationValidation class which analyzes text against source materials and adds appropriate citations using various similarity metrics.
- class sherpa_ai.output_parsers.citation_validation.CitationValidation(sequence_threshold=0.7, jaccard_threshold=0.7, token_overlap=0.7)[source]#
Bases:
BaseOutputProcessorValidator and citation adder for text content.
This class analyzes text against source materials to validate content and add appropriate citations. It uses multiple similarity metrics to determine when citations are needed and which sources to cite.
- sequence_threshold#
Minimum ratio of common subsequence length to text length for citation. Default is 0.7.
- Type:
float
- jaccard_threshold#
Minimum Jaccard similarity for citation. Default is 0.7.
- Type:
float
- token_overlap#
Minimum token overlap ratio for citation. Default is 0.7.
- Type:
float
Example
>>> validator = CitationValidation(sequence_threshold=0.8) >>> belief = Belief() # Contains source about "Python is great" >>> result = validator.process_output("Python is great!", belief) >>> print("[1]" in result.result) # Has citation True
- calculate_token_overlap(sentence1, sentence2)[source]#
Calculates the percentage of token overlap between two sentences.
This method tokenizes both sentences and calculates the percentage of shared tokens relative to each sentence’s length.
- Parameters:
sentence1 (str) – First sentence to compare.
sentence2 (str) – Second sentence to compare.
- Returns:
- (overlap_ratio_1, overlap_ratio_2) where each ratio is the
proportion of shared tokens to total tokens in that sentence.
- Return type:
tuple
Example
>>> validator = CitationValidation() >>> ratio1, ratio2 = validator.calculate_token_overlap( ... "The cat is black", ... "The cat is white" ... ) >>> print(f"{ratio1:.2f}, {ratio2:.2f}") '0.75, 0.75'
- jaccard_index(sentence1, sentence2)[source]#
Calculates the Jaccard index between two sentences.
This method computes the Jaccard index (intersection over union) between the sets of tokens from both sentences.
- Parameters:
sentence1 (str) – First sentence to compare.
sentence2 (str) – Second sentence to compare.
- Returns:
Jaccard similarity score between 0 and 1.
- Return type:
float
Example
>>> validator = CitationValidation() >>> score = validator.jaccard_index( ... "The cat is black", ... "The cat is white" ... ) >>> print(f"{score:.2f}") '0.60'
- longest_common_subsequence(text1, text2)[source]#
Calculate length of longest common subsequence.
This method finds the length of the longest subsequence of characters that appear in both texts in the same order.
- Parameters:
text1 (str) – First text to compare.
text2 (str) – Second text to compare.
- Returns:
Length of longest common subsequence.
- Return type:
int
Example
>>> validator = CitationValidation() >>> length = validator.longest_common_subsequence( ... "hello world", ... "hello there" ... ) >>> print(length) 6
- flatten_nested_list(nested_list)[source]#
Flatten a nested list of strings.
- Parameters:
nested_list (list[list[str]]) – List of lists of strings.
- Returns:
Single list containing all non-empty strings.
- Return type:
list[str]
Example
>>> validator = CitationValidation() >>> flat = validator.flatten_nested_list([["a", "b"], ["c", ""]]) >>> print(flat) ['a', 'b', 'c']
- split_paragraph_into_sentences(paragraph)[source]#
Split paragraph into sentences using NLTK.
- Parameters:
paragraph (str) – Text to split into sentences.
- Returns:
List of sentences from the paragraph.
- Return type:
list[str]
Example
>>> validator = CitationValidation() >>> sentences = validator.split_paragraph_into_sentences( ... "Hello there. How are you?" ... ) >>> print(sentences) ['Hello there.', 'How are you?']
- resources_from_belief(belief)[source]#
Extract resources from belief state actions.
- Parameters:
belief (Belief) – Agent’s belief state containing actions.
- Returns:
List of resources from retrieval actions.
- Return type:
list[ActionResource]
Example
>>> validator = CitationValidation() >>> belief = Belief() # Contains retrieval action with resource >>> resources = validator.resources_from_belief(belief) >>> print(len(resources)) 1
- process_output(text, belief, **kwargs)[source]#
Process text and add citations from belief resources.
This method analyzes the input text against resources in the belief state and adds citations where appropriate based on similarity metrics.
- Parameters:
text (str) – Text to process and add citations to.
belief (Belief) – Agent’s belief state containing resources.
**kwargs – Additional arguments for processing.
- Returns:
Result containing text with citations added.
- Return type:
Example
>>> validator = CitationValidation() >>> belief = Belief() # Contains source about "Python" >>> result = validator.process_output( ... "Python is a great language.", ... belief ... ) >>> print("[1]" in result.result) # Has citation True
- add_citation_to_sentence(sentence, resources)[source]#
Add citations to a single sentence.
This method checks the sentence against each resource using similarity metrics to determine which sources to cite.
- Parameters:
sentence (str) – Sentence to add citations to.
resources (list[ActionResource]) – Available citation sources.
- Returns:
a list of citation identifiers citation_links: a list of citation links (URLs)
- Return type:
citation_ids
Example
>>> validator = CitationValidation() >>> resource = ActionResource( ... source="http://example.com", ... content="Python is great" ... ) >>> ids, urls = validator.add_citation_to_sentence( ... "Python is great!", ... [resource] ... ) >>> print(len(ids), urls[0]) 1 http://example.com
- format_sentence_with_citations(sentence, ids, links)[source]#
Format a sentence with its citations.
This method adds citation references to the end of a sentence in the format [id](url).
- Parameters:
sentence (str) – Sentence to add citations to.
ids (list[int]) – Citation ID numbers.
links (list[str]) – Citation URLs.
- Returns:
Sentence with citations added.
- Return type:
str
Example
>>> validator = CitationValidation() >>> result = validator.format_sentence_with_citations( ... "Python is great.", ... [1], ... ["http://example.com"] ... ) >>> print(result) 'Python is great [1](http://example.com).'
sherpa_ai.output_parsers.link_parse module#
Link parsing and symbol substitution module for Sherpa AI.
This module provides functionality for parsing and transforming links in text. It defines the LinkParser class which can convert between links and symbolic references, maintaining a consistent mapping between them.
- class sherpa_ai.output_parsers.link_parse.LinkParser[source]#
Bases:
BaseOutputParserParser for converting between links and symbolic references.
This class handles the conversion of URLs to symbolic references and vice versa, maintaining a consistent mapping between them. It can process both raw URLs and tool-generated output containing links.
- Attributes:
links (list): List of unique links encountered during parsing. link_to_id (dict): Mapping of links to their symbolic references. count (int): Counter for generating unique symbol IDs. output_counter (int): Counter for reindexing output symbols. reindex_mapping (dict): Mapping of original IDs to reindexed IDs. url_pattern (str): Regex pattern for identifying links. doc_id_pattern (str): Regex pattern for identifying document IDs. link_symbol (str): Format string for link symbols.
- Example:
>>> parser = LinkParser() >>> text = "Check Link:example.com and Link:test.com" >>> result = parser.parse_output(text, tool_output=True) >>> print(result) 'DocID:[1]
DocID:[2] ‘
>>> back = parser.parse_output("[1] and [2]") >>> print(back) '<http://example.com|[1]> and <http://test.com|[2]>'
- parse_output(text, tool_output=False)[source]#
Parse and transform links in text.
This method either converts URLs to symbolic references (when tool_output is True) or converts symbolic references back to clickable links (when tool_output is False).
- Args:
text (str): Text containing either URLs or symbolic references. tool_output (bool): Whether the input is from a tool (True) or
user-facing text (False).
- Returns:
- str: Text with either URLs converted to symbols or symbols
converted to clickable links.
- Example:
>>> parser = LinkParser() >>> # Convert URLs to symbols >>> result = parser.parse_output("Link:example.com", tool_output=True) >>> print(result) 'DocID:[1]
- ‘
>>> # Convert symbols back to links >>> result = parser.parse_output("[1]") >>> print(result) '<http://example.com|[1]>'
- Return type:
str
sherpa_ai.output_parsers.md_to_slack_parse module#
Markdown to Slack format conversion module for Sherpa AI.
This module provides functionality for converting Markdown-formatted text to Slack-compatible format. It defines the MDToSlackParse class which handles the conversion of Markdown links to Slack’s link format.
- class sherpa_ai.output_parsers.md_to_slack_parse.MDToSlackParse[source]#
Bases:
BaseOutputParserParser for converting Markdown links to Slack format.
This class converts Markdown-style links ([text](url)) to Slack’s link format (<url|text>). It maintains the link text and URL while changing only the syntax to match Slack’s requirements.
- pattern#
Regex pattern for identifying Markdown links.
- Type:
str
Example
>>> parser = MDToSlackParse() >>> text = "Check out [this link](http://example.com)!" >>> result = parser.parse_output(text) >>> print(result) 'Check out <http://example.com|this link>!'
- parse_output(text)[source]#
Convert Markdown links to Slack format.
This method finds all Markdown-style links in the input text and converts them to Slack’s link format while preserving the link text and URL.
- Parameters:
text (str) – Text containing Markdown-style links.
- Returns:
Text with links converted to Slack format.
- Return type:
str
Example
>>> parser = MDToSlackParse() >>> text = "See [docs](https://docs.com) and [code](https://code.com)" >>> result = parser.parse_output(text) >>> print(result) 'See <https://docs.com|docs> and <https://code.com|code>'
sherpa_ai.output_parsers.number_validation module#
Number validation module for Sherpa AI.
This module provides functionality for validating numerical information in text. It defines the NumberValidation class which verifies that numbers mentioned in generated text exist in the source material.
- class sherpa_ai.output_parsers.number_validation.NumberValidation[source]#
Bases:
BaseOutputProcessorValidates the presence or absence of numerical information in a given piece of text.
This class validates that any numbers mentioned in generated text can be found in the source material, helping ensure numerical accuracy and prevent hallucination of numbers.
Example
>>> validator = NumberValidation() >>> belief = Belief() # Contains source text with "42 items" >>> result = validator.process_output("There are 42 items.", belief) >>> print(result.is_valid) True >>> result = validator.process_output("There are 100 items.", belief) >>> print(result.is_valid) False
- process_output(text, belief, **kwargs)[source]#
Verifies that all numbers within text exist in the belief source text.
- Parameters:
text (str) – Text containing numbers to validate.
belief (Belief) – Agent’s belief state containing source material.
**kwargs – Additional arguments for processing.
- Returns:
- Result indicating whether all numbers are valid,
with feedback if validation fails.
- Return type:
Example
>>> validator = NumberValidation() >>> belief = Belief() # Contains "The price is $50" >>> result = validator.process_output("It costs $50", belief) >>> print(result.is_valid) True >>> print(result.feedback) ''
- get_failure_message()[source]#
Get a message describing validation failures.
- Returns:
Warning message about potential numerical inaccuracies.
- Return type:
str
Example
>>> validator = NumberValidation() >>> print(validator.get_failure_message()) 'The numeric value results might not be fully reliable...'
sherpa_ai.output_parsers.validation_result module#
Validation result model for Sherpa AI output processors.
This module provides the ValidationResult class which represents the outcome of content validation operations. It includes status, result content, and optional feedback information.
- class sherpa_ai.output_parsers.validation_result.ValidationResult(**data)[source]#
Bases:
BaseModelResult of content validation operations.
This class represents the outcome of validating content, including whether the validation passed, the processed content, and any feedback about the validation process.
- is_valid#
Whether the validation passed (True) or failed (False).
- Type:
bool
- result#
The processed or validated content.
- Type:
str
- feedback#
Additional information about the validation result.
- Type:
str
Example
>>> result = ValidationResult( ... is_valid=True, ... result="Validated text", ... feedback="All checks passed" ... ) >>> print(result.is_valid) True >>> print(result.feedback) 'All checks passed'
- is_valid: bool#
- result: str#
- feedback: str#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
sherpa_ai.output_parsers.self_consistency module#
- class sherpa_ai.output_parsers.self_consistency.MaximumLikelihoodConcretizer(config=None)[source]#
Bases:
Concretizer- concretize(abstract_object, return_dict=False)[source]#
Concretize an abstract object by selecting the most likely value for each attribute. For list attributes, uses top-k or threshold-based selection.
- Parameters:
abstract_object (AbstractObject) – The abstract object to concretize.
- Returns:
A concrete object with the most likely values for each attribute.
- Return type:
BaseModel
- class sherpa_ai.output_parsers.self_consistency.ObjectAggregator(obj_schema, *, value_weight_map: dict[str, dict | float] = {}, obj_dict: dict[str, list | dict] = {})[source]#
Bases:
BaseModelClass representing an aggregation of objects by capture their attributes values as a list
- value_weight_map: dict[str, dict | float]#
Dictionary mapping each field to a dictionary of values and their weights. If the field is a primitive type, it will be mapped to a dictionary with values as keys and their weights as values. If the field is a nested model, it will be mapped to a dictionary for storing object values. The default weight is 1.0.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- obj_dict: dict[str, list | dict]#
Dictionary representing the object aggregator, where each field is mapped to a list or a dictionary. If the field is a primitive type, it will be mapped to a list for storing object values. If the field is a nested model, it will be mapped to a dictionary for storing object values.
- class sherpa_ai.output_parsers.self_consistency.AbstractObject(**data)[source]#
Bases:
BaseModelAbstract object that maps each attribute into a distribution
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- obj_dict: dict[str, Distribution | dict]#
- classmethod from_aggregator(obj_aggregator)[source]#
Create an AbstractObject from an ObjectAggregator.
- Parameters:
obj_aggregator (ObjectAggregator) – The ObjectAggregator to convert.
- Returns:
An instance of AbstractObject with the aggregated data.
- Return type:
- sherpa_ai.output_parsers.self_consistency.run_self_consistency(objects, schema, aggregator_cls=<class 'sherpa_ai.output_parsers.self_consistency.object_aggregator.ObjectAggregator'>, concretizer=None, value_weight_map={}, config=None)[source]#
Run self-consistency on a list of objects using the provided schema and configuration.
- Parameters:
objects (list[BaseModel]) – List of objects to process.
schema (type[BaseModel]) – Pydantic schema for validation.
aggregator_cls (type[ObjectAggregator], optional) – Class to use for aggregation. Defaults to ObjectAggregator.
concretizer (Optional[Concretizer], optional) – Concretizer to use for final output. Defaults to MaximumLikelihoodConcretizer.
value_weight_map (dict[str, Union[dict, float]], optional) – Weight map for each attribute of the object. Defaults to {}.
config (Optional[SelfConsistencyConfig], optional) – Configuration for self-consistency processing. If None, default configuration will be used.
- Returns:
The final concrete object after self-consistency processing (instance of schema).
- Return type:
BaseModel
- class sherpa_ai.output_parsers.self_consistency.SelfConsistencyConfig(**data)[source]#
Bases:
BaseModelConfiguration for self-consistency processing.
This class provides a structured way to configure self-consistency behavior, particularly for list attributes. It replaces the previous dict-based configuration approach.
- list_config#
Dictionary mapping field names to their list processing configurations. Defaults to an empty dictionary.
- list_config: dict[str, ListConfig]#
- get_list_config(field_path)[source]#
Get the list configuration for a specific field path.
- Parameters:
field_path (
str) – The path to the field (e.g., “tags” or “nested.field”)- Returns:
The configuration for the field, or default if not specified
- Return type:
- has_list_config(field_path)[source]#
Check if a field has specific list configuration.
- Parameters:
field_path (
str) – The path to the field- Returns:
True if the field has specific configuration, False otherwise
- Return type:
bool
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class sherpa_ai.output_parsers.self_consistency.ListConfig(**data)[source]#
Bases:
BaseModelConfiguration for individual list attributes in self-consistency processing.
This class defines how list attributes should be processed during the self-consistency aggregation and concretization process.
- top_k#
Number of top items to select when using “top_k” strategy. Defaults to 0 (which means use default behavior).
- threshold#
Minimum frequency threshold when using “threshold” strategy. Defaults to 2.0.
- strategy#
The strategy to use for selecting items from list attributes. Either “top_k” or “threshold”. Defaults to “top_k”.
- top_k: int#
- threshold: float#
- strategy: Literal['top_k', 'threshold']#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Module contents#
Output parsing and validation module for Sherpa AI.
This module provides various parsers and validators for processing model outputs. It includes parsers for links, Markdown to Slack conversion, and validators for citations, numbers, and entities.
Example
>>> from sherpa_ai.output_parsers import LinkParser, NumberValidation
>>> link_parser = LinkParser()
>>> links = link_parser.parse("Check out https://example.com")
>>> number_validator = NumberValidation()
>>> result = number_validator.validate("The answer is 42")
- class sherpa_ai.output_parsers.LinkParser[source]#
Bases:
BaseOutputParserParser for converting between links and symbolic references.
This class handles the conversion of URLs to symbolic references and vice versa, maintaining a consistent mapping between them. It can process both raw URLs and tool-generated output containing links.
- Attributes:
links (list): List of unique links encountered during parsing. link_to_id (dict): Mapping of links to their symbolic references. count (int): Counter for generating unique symbol IDs. output_counter (int): Counter for reindexing output symbols. reindex_mapping (dict): Mapping of original IDs to reindexed IDs. url_pattern (str): Regex pattern for identifying links. doc_id_pattern (str): Regex pattern for identifying document IDs. link_symbol (str): Format string for link symbols.
- Example:
>>> parser = LinkParser() >>> text = "Check Link:example.com and Link:test.com" >>> result = parser.parse_output(text, tool_output=True) >>> print(result) 'DocID:[1]
DocID:[2] ‘
>>> back = parser.parse_output("[1] and [2]") >>> print(back) '<http://example.com|[1]> and <http://test.com|[2]>'
- parse_output(text, tool_output=False)[source]#
Parse and transform links in text.
This method either converts URLs to symbolic references (when tool_output is True) or converts symbolic references back to clickable links (when tool_output is False).
- Args:
text (str): Text containing either URLs or symbolic references. tool_output (bool): Whether the input is from a tool (True) or
user-facing text (False).
- Returns:
- str: Text with either URLs converted to symbols or symbols
converted to clickable links.
- Example:
>>> parser = LinkParser() >>> # Convert URLs to symbols >>> result = parser.parse_output("Link:example.com", tool_output=True) >>> print(result) 'DocID:[1]
- ‘
>>> # Convert symbols back to links >>> result = parser.parse_output("[1]") >>> print(result) '<http://example.com|[1]>'
- Return type:
str
- class sherpa_ai.output_parsers.MDToSlackParse[source]#
Bases:
BaseOutputParserParser for converting Markdown links to Slack format.
This class converts Markdown-style links ([text](url)) to Slack’s link format (<url|text>). It maintains the link text and URL while changing only the syntax to match Slack’s requirements.
- pattern#
Regex pattern for identifying Markdown links.
- Type:
str
Example
>>> parser = MDToSlackParse() >>> text = "Check out [this link](http://example.com)!" >>> result = parser.parse_output(text) >>> print(result) 'Check out <http://example.com|this link>!'
- parse_output(text)[source]#
Convert Markdown links to Slack format.
This method finds all Markdown-style links in the input text and converts them to Slack’s link format while preserving the link text and URL.
- Parameters:
text (str) – Text containing Markdown-style links.
- Returns:
Text with links converted to Slack format.
- Return type:
str
Example
>>> parser = MDToSlackParse() >>> text = "See [docs](https://docs.com) and [code](https://code.com)" >>> result = parser.parse_output(text) >>> print(result) 'See <https://docs.com|docs> and <https://code.com|code>'
- class sherpa_ai.output_parsers.CitationValidation(sequence_threshold=0.7, jaccard_threshold=0.7, token_overlap=0.7)[source]#
Bases:
BaseOutputProcessorValidator and citation adder for text content.
This class analyzes text against source materials to validate content and add appropriate citations. It uses multiple similarity metrics to determine when citations are needed and which sources to cite.
- sequence_threshold#
Minimum ratio of common subsequence length to text length for citation. Default is 0.7.
- Type:
float
- jaccard_threshold#
Minimum Jaccard similarity for citation. Default is 0.7.
- Type:
float
- token_overlap#
Minimum token overlap ratio for citation. Default is 0.7.
- Type:
float
Example
>>> validator = CitationValidation(sequence_threshold=0.8) >>> belief = Belief() # Contains source about "Python is great" >>> result = validator.process_output("Python is great!", belief) >>> print("[1]" in result.result) # Has citation True
- calculate_token_overlap(sentence1, sentence2)[source]#
Calculates the percentage of token overlap between two sentences.
This method tokenizes both sentences and calculates the percentage of shared tokens relative to each sentence’s length.
- Parameters:
sentence1 (str) – First sentence to compare.
sentence2 (str) – Second sentence to compare.
- Returns:
- (overlap_ratio_1, overlap_ratio_2) where each ratio is the
proportion of shared tokens to total tokens in that sentence.
- Return type:
tuple
Example
>>> validator = CitationValidation() >>> ratio1, ratio2 = validator.calculate_token_overlap( ... "The cat is black", ... "The cat is white" ... ) >>> print(f"{ratio1:.2f}, {ratio2:.2f}") '0.75, 0.75'
- jaccard_index(sentence1, sentence2)[source]#
Calculates the Jaccard index between two sentences.
This method computes the Jaccard index (intersection over union) between the sets of tokens from both sentences.
- Parameters:
sentence1 (str) – First sentence to compare.
sentence2 (str) – Second sentence to compare.
- Returns:
Jaccard similarity score between 0 and 1.
- Return type:
float
Example
>>> validator = CitationValidation() >>> score = validator.jaccard_index( ... "The cat is black", ... "The cat is white" ... ) >>> print(f"{score:.2f}") '0.60'
- longest_common_subsequence(text1, text2)[source]#
Calculate length of longest common subsequence.
This method finds the length of the longest subsequence of characters that appear in both texts in the same order.
- Parameters:
text1 (str) – First text to compare.
text2 (str) – Second text to compare.
- Returns:
Length of longest common subsequence.
- Return type:
int
Example
>>> validator = CitationValidation() >>> length = validator.longest_common_subsequence( ... "hello world", ... "hello there" ... ) >>> print(length) 6
- flatten_nested_list(nested_list)[source]#
Flatten a nested list of strings.
- Parameters:
nested_list (list[list[str]]) – List of lists of strings.
- Returns:
Single list containing all non-empty strings.
- Return type:
list[str]
Example
>>> validator = CitationValidation() >>> flat = validator.flatten_nested_list([["a", "b"], ["c", ""]]) >>> print(flat) ['a', 'b', 'c']
- split_paragraph_into_sentences(paragraph)[source]#
Split paragraph into sentences using NLTK.
- Parameters:
paragraph (str) – Text to split into sentences.
- Returns:
List of sentences from the paragraph.
- Return type:
list[str]
Example
>>> validator = CitationValidation() >>> sentences = validator.split_paragraph_into_sentences( ... "Hello there. How are you?" ... ) >>> print(sentences) ['Hello there.', 'How are you?']
- resources_from_belief(belief)[source]#
Extract resources from belief state actions.
- Parameters:
belief (Belief) – Agent’s belief state containing actions.
- Returns:
List of resources from retrieval actions.
- Return type:
list[ActionResource]
Example
>>> validator = CitationValidation() >>> belief = Belief() # Contains retrieval action with resource >>> resources = validator.resources_from_belief(belief) >>> print(len(resources)) 1
- process_output(text, belief, **kwargs)[source]#
Process text and add citations from belief resources.
This method analyzes the input text against resources in the belief state and adds citations where appropriate based on similarity metrics.
- Parameters:
text (str) – Text to process and add citations to.
belief (Belief) – Agent’s belief state containing resources.
**kwargs – Additional arguments for processing.
- Returns:
Result containing text with citations added.
- Return type:
Example
>>> validator = CitationValidation() >>> belief = Belief() # Contains source about "Python" >>> result = validator.process_output( ... "Python is a great language.", ... belief ... ) >>> print("[1]" in result.result) # Has citation True
- add_citation_to_sentence(sentence, resources)[source]#
Add citations to a single sentence.
This method checks the sentence against each resource using similarity metrics to determine which sources to cite.
- Parameters:
sentence (str) – Sentence to add citations to.
resources (list[ActionResource]) – Available citation sources.
- Returns:
a list of citation identifiers citation_links: a list of citation links (URLs)
- Return type:
citation_ids
Example
>>> validator = CitationValidation() >>> resource = ActionResource( ... source="http://example.com", ... content="Python is great" ... ) >>> ids, urls = validator.add_citation_to_sentence( ... "Python is great!", ... [resource] ... ) >>> print(len(ids), urls[0]) 1 http://example.com
- format_sentence_with_citations(sentence, ids, links)[source]#
Format a sentence with its citations.
This method adds citation references to the end of a sentence in the format [id](url).
- Parameters:
sentence (str) – Sentence to add citations to.
ids (list[int]) – Citation ID numbers.
links (list[str]) – Citation URLs.
- Returns:
Sentence with citations added.
- Return type:
str
Example
>>> validator = CitationValidation() >>> result = validator.format_sentence_with_citations( ... "Python is great.", ... [1], ... ["http://example.com"] ... ) >>> print(result) 'Python is great [1](http://example.com).'
- class sherpa_ai.output_parsers.NumberValidation[source]#
Bases:
BaseOutputProcessorValidates the presence or absence of numerical information in a given piece of text.
This class validates that any numbers mentioned in generated text can be found in the source material, helping ensure numerical accuracy and prevent hallucination of numbers.
Example
>>> validator = NumberValidation() >>> belief = Belief() # Contains source text with "42 items" >>> result = validator.process_output("There are 42 items.", belief) >>> print(result.is_valid) True >>> result = validator.process_output("There are 100 items.", belief) >>> print(result.is_valid) False
- process_output(text, belief, **kwargs)[source]#
Verifies that all numbers within text exist in the belief source text.
- Parameters:
text (str) – Text containing numbers to validate.
belief (Belief) – Agent’s belief state containing source material.
**kwargs – Additional arguments for processing.
- Returns:
- Result indicating whether all numbers are valid,
with feedback if validation fails.
- Return type:
Example
>>> validator = NumberValidation() >>> belief = Belief() # Contains "The price is $50" >>> result = validator.process_output("It costs $50", belief) >>> print(result.is_valid) True >>> print(result.feedback) ''
- get_failure_message()[source]#
Get a message describing validation failures.
- Returns:
Warning message about potential numerical inaccuracies.
- Return type:
str
Example
>>> validator = NumberValidation() >>> print(validator.get_failure_message()) 'The numeric value results might not be fully reliable...'
- class sherpa_ai.output_parsers.EntityValidation[source]#
Bases:
BaseOutputProcessorValidator for named entities in text.
This class validates that entities mentioned in generated text can be found in the source material, using progressively more sophisticated similarity comparison methods if initial validation fails.
Example
>>> validator = EntityValidation() >>> belief = Belief() # Contains source text about "John Smith" >>> result = validator.process_output("John Smith is CEO.", belief) >>> print(result.is_valid) True >>> result = validator.process_output("Jane Doe is CEO.", belief) >>> print(result.is_valid) False
- process_output(text, belief, llm=None, **kwargs)[source]#
Validate entities in text against source material.
This method checks that entities mentioned in the input text can be found in the source material stored in the belief state. It uses increasingly sophisticated comparison methods on validation failures.
- Parameters:
text (str) – Text containing entities to validate.
belief (Belief) – Agent’s belief state containing source material.
llm (BaseLanguageModel, optional) – Language model for advanced comparison.
**kwargs – Additional arguments for processing.
- Returns:
- Result indicating whether all entities are valid,
with feedback if validation fails.
- Return type:
Example
>>> validator = EntityValidation() >>> belief = Belief() # Contains text about "Microsoft" >>> result = validator.process_output("Microsoft announced...", belief) >>> print(result.is_valid) True >>> print(result.feedback) ''
- similarity_picker(value)[source]#
Select text similarity comparison method.
This method determines which similarity comparison method to use based on the number of previous validation attempts.
- Parameters:
value (int) – The iteration count value used to determine the text similarity state. 0: Use BASIC text similarity. 1: Use text similarity BY_METRICS. Default: Use text similarity BY_LLM.
- Returns:
Selected comparison method.
- Return type:
TextSimilarityMethod
Example
>>> validator = EntityValidation() >>> method = validator.similarity_picker(0) >>> print(method) TextSimilarityMethod.BASIC >>> method = validator.similarity_picker(2) >>> print(method) TextSimilarityMethod.LLM
- get_failure_message()[source]#
Get a message describing validation failures.
- Returns:
Warning message about potential missing entities.
- Return type:
str
Example
>>> validator = EntityValidation() >>> print(validator.get_failure_message()) 'Some enitities from the source might not be mentioned.'
- check_entities_match(result, source, stage, llm)[source]#
Check if entities in result match those in source.
This method compares entities between result and source text using the specified similarity comparison method.
- Parameters:
result (str) – Text containing entities to validate.
source (str) – Source text to validate against.
stage (TextSimilarityMethod) – Comparison method to use.
llm (BaseLanguageModel) – Language model for LLM-based comparison.
- Returns:
Whether entities match and error message if not.
- Return type:
Tuple[bool, str]
Example
>>> validator = EntityValidation() >>> match, msg = validator.check_entities_match( ... "Apple released...", ... "Apple announced...", ... TextSimilarityMethod.BASIC, ... None ... ) >>> print(match) True