sherpa_ai.output_parsers package

In This Page:

sherpa_ai.output_parsers package#

Overview#

The output_parsers package provides tools for validating, formatting, and transforming model outputs in Sherpa AI. These parsers ensure that responses meet specific criteria and formats before being presented to users.

Key Components

  • Citation Validation: Ensures proper citation formatting and accuracy

  • Number Validation: Verifies numerical responses for correctness

  • Link Parsing: Extracts and validates hyperlinks from responses

  • Self-Consistency: Implements self-consistency improvement for complex Pytandic outputs

  • Format Conversion: Converts between different formats (e.g., Markdown to Slack)

Example Usage#

from sherpa_ai.output_parsers.citation_validation import CitationValidator
from sherpa_ai.output_parsers.number_validation import NumberValidator

# Validate citations in a response
citation_validator = CitationValidator()
citation_result = citation_validator.validate(
    "According to Smith et al. (2023), AI has made significant progress."
)
print(f"Citation valid: {citation_result.is_valid}")

# Validate numerical answers
number_validator = NumberValidator()
number_result = number_validator.validate("The answer is 42.5 meters.")
print(f"Extracted number: {number_result.validated_output}")

Submodules#

Module

Description

sherpa_ai.output_parsers.base

Provides abstract base classes for all output parsers.

sherpa_ai.output_parsers.citation_validation

Implements validation for proper citation formatting and accuracy.

sherpa_ai.output_parsers.link_parse

Contains tools for extracting and validating hyperlinks in responses.

sherpa_ai.output_parsers.md_to_slack_parse

Provides conversion from Markdown to Slack message formatting.

sherpa_ai.output_parsers.number_validation

Implements validation for numerical answers and calculations.

sherpa_ai.output_parsers.validation_result

Contains the ValidationResult class for representing validation outcomes.

sherpa_ai.output_parsers.self_consistency

Implements self-consistency improvement for complex Pydantic outputs.

sherpa_ai.output_parsers.base module#

Base classes for output parsing and processing in Sherpa AI.

This module provides abstract base classes for output parsing and processing. It defines the core interfaces that all parsers and processors must implement, ensuring consistent behavior across different implementations.

class sherpa_ai.output_parsers.base.BaseOutputParser[source]#

Bases: ABC

Abstract base class for output parsers.

This class defines the interface that all output parsers must implement. Output parsers are responsible for transforming raw text output into a structured or modified format.

Example

>>> class MyParser(BaseOutputParser):
...     def parse_output(self, text: str) -> str:
...         return text.upper()
>>> parser = MyParser()
>>> result = parser.parse_output("hello")
>>> print(result)
'HELLO'
abstractmethod parse_output(**kwargs)[source]#

Abstract method to be implemented by subclasses for parsing output text.

Parameters:
  • text (str) – The input text to be parsed.

  • **kwargs – Additional arguments for parsing.

Returns:

The parsed output text.

Return type:

str

Example

>>> parser = MyParser()
>>> result = parser.parse_output("hello world")
>>> print(result)
'HELLO WORLD'
class sherpa_ai.output_parsers.base.BaseOutputProcessor[source]#

Bases: ABC

Abstract base class for output processors.

This class defines the interface that all output processors must implement. Output processors validate and transform text output, tracking validation failures and providing detailed feedback.

count#

Number of failed validations since last reset.

Type:

int

Example

>>> class MyProcessor(BaseOutputProcessor):
...     def process_output(self, text: str) -> ValidationResult:
...         valid = len(text) > 5
...         return ValidationResult(valid, text, "Length check")
>>> processor = MyProcessor()
>>> result = processor("hello world")
>>> print(result.valid)
True
count: int = 0#
reset_state()[source]#

Reset the validation failure counter.

This method resets the count of failed validations back to zero.

Example

>>> processor = MyProcessor()
>>> processor.count = 5
>>> processor.reset_state()
>>> print(processor.count)
0
abstractmethod process_output(text, **kwargs)[source]#

Process and validate the input text.

Parameters:
  • text (str) – The input text to be processed.

  • **kwargs – Additional arguments for processing.

Returns:

Result containing validity status, processed text,

and optional feedback.

Return type:

ValidationResult

Example

>>> processor = MyProcessor()
>>> result = processor.process_output("hello world")
>>> print(result.valid)
True
>>> print(result.feedback)
'Length check'

sherpa_ai.output_parsers.citation_validation module#

Citation validation and addition module for Sherpa AI.

This module provides functionality for validating and adding citations to text. It defines the CitationValidation class which analyzes text against source materials and adds appropriate citations using various similarity metrics.

class sherpa_ai.output_parsers.citation_validation.CitationValidation(sequence_threshold=0.7, jaccard_threshold=0.7, token_overlap=0.7)[source]#

Bases: BaseOutputProcessor

Validator and citation adder for text content.

This class analyzes text against source materials to validate content and add appropriate citations. It uses multiple similarity metrics to determine when citations are needed and which sources to cite.

sequence_threshold#

Minimum ratio of common subsequence length to text length for citation. Default is 0.7.

Type:

float

jaccard_threshold#

Minimum Jaccard similarity for citation. Default is 0.7.

Type:

float

token_overlap#

Minimum token overlap ratio for citation. Default is 0.7.

Type:

float

Example

>>> validator = CitationValidation(sequence_threshold=0.8)
>>> belief = Belief()  # Contains source about "Python is great"
>>> result = validator.process_output("Python is great!", belief)
>>> print("[1]" in result.result)  # Has citation
True
calculate_token_overlap(sentence1, sentence2)[source]#

Calculates the percentage of token overlap between two sentences.

This method tokenizes both sentences and calculates the percentage of shared tokens relative to each sentence’s length.

Parameters:
  • sentence1 (str) – First sentence to compare.

  • sentence2 (str) – Second sentence to compare.

Returns:

(overlap_ratio_1, overlap_ratio_2) where each ratio is the

proportion of shared tokens to total tokens in that sentence.

Return type:

tuple

Example

>>> validator = CitationValidation()
>>> ratio1, ratio2 = validator.calculate_token_overlap(
...     "The cat is black",
...     "The cat is white"
... )
>>> print(f"{ratio1:.2f}, {ratio2:.2f}")
'0.75, 0.75'
jaccard_index(sentence1, sentence2)[source]#

Calculates the Jaccard index between two sentences.

This method computes the Jaccard index (intersection over union) between the sets of tokens from both sentences.

Parameters:
  • sentence1 (str) – First sentence to compare.

  • sentence2 (str) – Second sentence to compare.

Returns:

Jaccard similarity score between 0 and 1.

Return type:

float

Example

>>> validator = CitationValidation()
>>> score = validator.jaccard_index(
...     "The cat is black",
...     "The cat is white"
... )
>>> print(f"{score:.2f}")
'0.60'
longest_common_subsequence(text1, text2)[source]#

Calculate length of longest common subsequence.

This method finds the length of the longest subsequence of characters that appear in both texts in the same order.

Parameters:
  • text1 (str) – First text to compare.

  • text2 (str) – Second text to compare.

Returns:

Length of longest common subsequence.

Return type:

int

Example

>>> validator = CitationValidation()
>>> length = validator.longest_common_subsequence(
...     "hello world",
...     "hello there"
... )
>>> print(length)
6
flatten_nested_list(nested_list)[source]#

Flatten a nested list of strings.

Parameters:

nested_list (list[list[str]]) – List of lists of strings.

Returns:

Single list containing all non-empty strings.

Return type:

list[str]

Example

>>> validator = CitationValidation()
>>> flat = validator.flatten_nested_list([["a", "b"], ["c", ""]])
>>> print(flat)
['a', 'b', 'c']
split_paragraph_into_sentences(paragraph)[source]#

Split paragraph into sentences using NLTK.

Parameters:

paragraph (str) – Text to split into sentences.

Returns:

List of sentences from the paragraph.

Return type:

list[str]

Example

>>> validator = CitationValidation()
>>> sentences = validator.split_paragraph_into_sentences(
...     "Hello there. How are you?"
... )
>>> print(sentences)
['Hello there.', 'How are you?']
resources_from_belief(belief)[source]#

Extract resources from belief state actions.

Parameters:

belief (Belief) – Agent’s belief state containing actions.

Returns:

List of resources from retrieval actions.

Return type:

list[ActionResource]

Example

>>> validator = CitationValidation()
>>> belief = Belief()  # Contains retrieval action with resource
>>> resources = validator.resources_from_belief(belief)
>>> print(len(resources))
1
process_output(text, belief, **kwargs)[source]#

Process text and add citations from belief resources.

This method analyzes the input text against resources in the belief state and adds citations where appropriate based on similarity metrics.

Parameters:
  • text (str) – Text to process and add citations to.

  • belief (Belief) – Agent’s belief state containing resources.

  • **kwargs – Additional arguments for processing.

Returns:

Result containing text with citations added.

Return type:

ValidationResult

Example

>>> validator = CitationValidation()
>>> belief = Belief()  # Contains source about "Python"
>>> result = validator.process_output(
...     "Python is a great language.",
...     belief
... )
>>> print("[1]" in result.result)  # Has citation
True
add_citation_to_sentence(sentence, resources)[source]#

Add citations to a single sentence.

This method checks the sentence against each resource using similarity metrics to determine which sources to cite.

Parameters:
  • sentence (str) – Sentence to add citations to.

  • resources (list[ActionResource]) – Available citation sources.

Returns:

a list of citation identifiers citation_links: a list of citation links (URLs)

Return type:

citation_ids

Example

>>> validator = CitationValidation()
>>> resource = ActionResource(
...     source="http://example.com",
...     content="Python is great"
... )
>>> ids, urls = validator.add_citation_to_sentence(
...     "Python is great!",
...     [resource]
... )
>>> print(len(ids), urls[0])
1 http://example.com
format_sentence_with_citations(sentence, ids, links)[source]#

Format a sentence with its citations.

This method adds citation references to the end of a sentence in the format [id](url).

Parameters:
  • sentence (str) – Sentence to add citations to.

  • ids (list[int]) – Citation ID numbers.

  • links (list[str]) – Citation URLs.

Returns:

Sentence with citations added.

Return type:

str

Example

>>> validator = CitationValidation()
>>> result = validator.format_sentence_with_citations(
...     "Python is great.",
...     [1],
...     ["http://example.com"]
... )
>>> print(result)
'Python is great [1](http://example.com).'
add_citations(text, resources)[source]#
Return type:

ValidationResult

get_failure_message()[source]#
Return type:

str

sherpa_ai.output_parsers.md_to_slack_parse module#

Markdown to Slack format conversion module for Sherpa AI.

This module provides functionality for converting Markdown-formatted text to Slack-compatible format. It defines the MDToSlackParse class which handles the conversion of Markdown links to Slack’s link format.

class sherpa_ai.output_parsers.md_to_slack_parse.MDToSlackParse[source]#

Bases: BaseOutputParser

Parser for converting Markdown links to Slack format.

This class converts Markdown-style links ([text](url)) to Slack’s link format (<url|text>). It maintains the link text and URL while changing only the syntax to match Slack’s requirements.

pattern#

Regex pattern for identifying Markdown links.

Type:

str

Example

>>> parser = MDToSlackParse()
>>> text = "Check out [this link](http://example.com)!"
>>> result = parser.parse_output(text)
>>> print(result)
'Check out <http://example.com|this link>!'
parse_output(text)[source]#

Convert Markdown links to Slack format.

This method finds all Markdown-style links in the input text and converts them to Slack’s link format while preserving the link text and URL.

Parameters:

text (str) – Text containing Markdown-style links.

Returns:

Text with links converted to Slack format.

Return type:

str

Example

>>> parser = MDToSlackParse()
>>> text = "See [docs](https://docs.com) and [code](https://code.com)"
>>> result = parser.parse_output(text)
>>> print(result)
'See <https://docs.com|docs> and <https://code.com|code>'

sherpa_ai.output_parsers.number_validation module#

Number validation module for Sherpa AI.

This module provides functionality for validating numerical information in text. It defines the NumberValidation class which verifies that numbers mentioned in generated text exist in the source material.

class sherpa_ai.output_parsers.number_validation.NumberValidation[source]#

Bases: BaseOutputProcessor

Validates the presence or absence of numerical information in a given piece of text.

This class validates that any numbers mentioned in generated text can be found in the source material, helping ensure numerical accuracy and prevent hallucination of numbers.

Example

>>> validator = NumberValidation()
>>> belief = Belief()  # Contains source text with "42 items"
>>> result = validator.process_output("There are 42 items.", belief)
>>> print(result.is_valid)
True
>>> result = validator.process_output("There are 100 items.", belief)
>>> print(result.is_valid)
False
process_output(text, belief, **kwargs)[source]#

Verifies that all numbers within text exist in the belief source text.

Parameters:
  • text (str) – Text containing numbers to validate.

  • belief (Belief) – Agent’s belief state containing source material.

  • **kwargs – Additional arguments for processing.

Returns:

Result indicating whether all numbers are valid,

with feedback if validation fails.

Return type:

ValidationResult

Example

>>> validator = NumberValidation()
>>> belief = Belief()  # Contains "The price is $50"
>>> result = validator.process_output("It costs $50", belief)
>>> print(result.is_valid)
True
>>> print(result.feedback)
''
get_failure_message()[source]#

Get a message describing validation failures.

Returns:

Warning message about potential numerical inaccuracies.

Return type:

str

Example

>>> validator = NumberValidation()
>>> print(validator.get_failure_message())
'The numeric value results might not be fully reliable...'

sherpa_ai.output_parsers.validation_result module#

Validation result model for Sherpa AI output processors.

This module provides the ValidationResult class which represents the outcome of content validation operations. It includes status, result content, and optional feedback information.

class sherpa_ai.output_parsers.validation_result.ValidationResult(**data)[source]#

Bases: BaseModel

Result of content validation operations.

This class represents the outcome of validating content, including whether the validation passed, the processed content, and any feedback about the validation process.

is_valid#

Whether the validation passed (True) or failed (False).

Type:

bool

result#

The processed or validated content.

Type:

str

feedback#

Additional information about the validation result.

Type:

str

Example

>>> result = ValidationResult(
...     is_valid=True,
...     result="Validated text",
...     feedback="All checks passed"
... )
>>> print(result.is_valid)
True
>>> print(result.feedback)
'All checks passed'
is_valid: bool#
result: str#
feedback: str#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sherpa_ai.output_parsers.self_consistency module#

class sherpa_ai.output_parsers.self_consistency.MaximumLikelihoodConcretizer(config=None)[source]#

Bases: Concretizer

concretize(abstract_object, return_dict=False)[source]#

Concretize an abstract object by selecting the most likely value for each attribute. For list attributes, uses top-k or threshold-based selection.

Parameters:

abstract_object (AbstractObject) – The abstract object to concretize.

Returns:

A concrete object with the most likely values for each attribute.

Return type:

BaseModel

class sherpa_ai.output_parsers.self_consistency.ObjectAggregator(obj_schema, *, value_weight_map: dict[str, dict | float] = {}, obj_dict: dict[str, list | dict] = {})[source]#

Bases: BaseModel

Class representing an aggregation of objects by capture their attributes values as a list

obj_schema: type[BaseModel]#

Schema of the object, used to validate the object.

value_weight_map: dict[str, dict | float]#

Dictionary mapping each field to a dictionary of values and their weights. If the field is a primitive type, it will be mapped to a dictionary with values as keys and their weights as values. If the field is a nested model, it will be mapped to a dictionary for storing object values. The default weight is 1.0.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

obj_dict: dict[str, list | dict]#

Dictionary representing the object aggregator, where each field is mapped to a list or a dictionary. If the field is a primitive type, it will be mapped to a list for storing object values. If the field is a nested model, it will be mapped to a dictionary for storing object values.

add_object(obj)[source]#

Add an object to the aggregation of objects.

Parameters:

obj (BaseModel) – The object to add, must conform to the schema defined by obj_schema.

class sherpa_ai.output_parsers.self_consistency.AbstractObject(**data)[source]#

Bases: BaseModel

Abstract object that maps each attribute into a distribution

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

obj_schema: type[BaseModel]#

Schema of the object, used to validate the object.

obj_dict: dict[str, Distribution | dict]#
classmethod from_aggregator(obj_aggregator)[source]#

Create an AbstractObject from an ObjectAggregator.

Parameters:

obj_aggregator (ObjectAggregator) – The ObjectAggregator to convert.

Returns:

An instance of AbstractObject with the aggregated data.

Return type:

AbstractObject

sherpa_ai.output_parsers.self_consistency.run_self_consistency(objects, schema, aggregator_cls=<class 'sherpa_ai.output_parsers.self_consistency.object_aggregator.ObjectAggregator'>, concretizer=None, value_weight_map={}, config=None)[source]#

Run self-consistency on a list of objects using the provided schema and configuration.

Parameters:
  • objects (list[BaseModel]) – List of objects to process.

  • schema (type[BaseModel]) – Pydantic schema for validation.

  • aggregator_cls (type[ObjectAggregator], optional) – Class to use for aggregation. Defaults to ObjectAggregator.

  • concretizer (Optional[Concretizer], optional) – Concretizer to use for final output. Defaults to MaximumLikelihoodConcretizer.

  • value_weight_map (dict[str, Union[dict, float]], optional) – Weight map for each attribute of the object. Defaults to {}.

  • config (Optional[SelfConsistencyConfig], optional) – Configuration for self-consistency processing. If None, default configuration will be used.

Returns:

The final concrete object after self-consistency processing (instance of schema).

Return type:

BaseModel

class sherpa_ai.output_parsers.self_consistency.SelfConsistencyConfig(**data)[source]#

Bases: BaseModel

Configuration for self-consistency processing.

This class provides a structured way to configure self-consistency behavior, particularly for list attributes. It replaces the previous dict-based configuration approach.

list_config#

Dictionary mapping field names to their list processing configurations. Defaults to an empty dictionary.

list_config: dict[str, ListConfig]#
get_list_config(field_path)[source]#

Get the list configuration for a specific field path.

Parameters:

field_path (str) – The path to the field (e.g., “tags” or “nested.field”)

Returns:

The configuration for the field, or default if not specified

Return type:

ListConfig

has_list_config(field_path)[source]#

Check if a field has specific list configuration.

Parameters:

field_path (str) – The path to the field

Returns:

True if the field has specific configuration, False otherwise

Return type:

bool

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sherpa_ai.output_parsers.self_consistency.ListConfig(**data)[source]#

Bases: BaseModel

Configuration for individual list attributes in self-consistency processing.

This class defines how list attributes should be processed during the self-consistency aggregation and concretization process.

top_k#

Number of top items to select when using “top_k” strategy. Defaults to 0 (which means use default behavior).

threshold#

Minimum frequency threshold when using “threshold” strategy. Defaults to 2.0.

strategy#

The strategy to use for selecting items from list attributes. Either “top_k” or “threshold”. Defaults to “top_k”.

top_k: int#
threshold: float#
strategy: Literal['top_k', 'threshold']#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Module contents#

Output parsing and validation module for Sherpa AI.

This module provides various parsers and validators for processing model outputs. It includes parsers for links, Markdown to Slack conversion, and validators for citations, numbers, and entities.

Example

>>> from sherpa_ai.output_parsers import LinkParser, NumberValidation
>>> link_parser = LinkParser()
>>> links = link_parser.parse("Check out https://example.com")
>>> number_validator = NumberValidation()
>>> result = number_validator.validate("The answer is 42")
class sherpa_ai.output_parsers.LinkParser[source]#

Bases: BaseOutputParser

Parser for converting between links and symbolic references.

This class handles the conversion of URLs to symbolic references and vice versa, maintaining a consistent mapping between them. It can process both raw URLs and tool-generated output containing links.

Attributes:

links (list): List of unique links encountered during parsing. link_to_id (dict): Mapping of links to their symbolic references. count (int): Counter for generating unique symbol IDs. output_counter (int): Counter for reindexing output symbols. reindex_mapping (dict): Mapping of original IDs to reindexed IDs. url_pattern (str): Regex pattern for identifying links. doc_id_pattern (str): Regex pattern for identifying document IDs. link_symbol (str): Format string for link symbols.

Example:
>>> parser = LinkParser()
>>> text = "Check Link:example.com and Link:test.com"
>>> result = parser.parse_output(text, tool_output=True)
>>> print(result)
'DocID:[1]

DocID:[2] ‘

>>> back = parser.parse_output("[1] and [2]")
>>> print(back)
'<http://example.com|[1]> and <http://test.com|[2]>'
parse_output(text, tool_output=False)[source]#

Parse and transform links in text.

This method either converts URLs to symbolic references (when tool_output is True) or converts symbolic references back to clickable links (when tool_output is False).

Args:

text (str): Text containing either URLs or symbolic references. tool_output (bool): Whether the input is from a tool (True) or

user-facing text (False).

Returns:
str: Text with either URLs converted to symbols or symbols

converted to clickable links.

Example:
>>> parser = LinkParser()
>>> # Convert URLs to symbols
>>> result = parser.parse_output("Link:example.com", tool_output=True)
>>> print(result)
'DocID:[1]
>>> # Convert symbols back to links
>>> result = parser.parse_output("[1]")
>>> print(result)
'<http://example.com|[1]>'
Return type:

str

class sherpa_ai.output_parsers.MDToSlackParse[source]#

Bases: BaseOutputParser

Parser for converting Markdown links to Slack format.

This class converts Markdown-style links ([text](url)) to Slack’s link format (<url|text>). It maintains the link text and URL while changing only the syntax to match Slack’s requirements.

pattern#

Regex pattern for identifying Markdown links.

Type:

str

Example

>>> parser = MDToSlackParse()
>>> text = "Check out [this link](http://example.com)!"
>>> result = parser.parse_output(text)
>>> print(result)
'Check out <http://example.com|this link>!'
parse_output(text)[source]#

Convert Markdown links to Slack format.

This method finds all Markdown-style links in the input text and converts them to Slack’s link format while preserving the link text and URL.

Parameters:

text (str) – Text containing Markdown-style links.

Returns:

Text with links converted to Slack format.

Return type:

str

Example

>>> parser = MDToSlackParse()
>>> text = "See [docs](https://docs.com) and [code](https://code.com)"
>>> result = parser.parse_output(text)
>>> print(result)
'See <https://docs.com|docs> and <https://code.com|code>'
class sherpa_ai.output_parsers.CitationValidation(sequence_threshold=0.7, jaccard_threshold=0.7, token_overlap=0.7)[source]#

Bases: BaseOutputProcessor

Validator and citation adder for text content.

This class analyzes text against source materials to validate content and add appropriate citations. It uses multiple similarity metrics to determine when citations are needed and which sources to cite.

sequence_threshold#

Minimum ratio of common subsequence length to text length for citation. Default is 0.7.

Type:

float

jaccard_threshold#

Minimum Jaccard similarity for citation. Default is 0.7.

Type:

float

token_overlap#

Minimum token overlap ratio for citation. Default is 0.7.

Type:

float

Example

>>> validator = CitationValidation(sequence_threshold=0.8)
>>> belief = Belief()  # Contains source about "Python is great"
>>> result = validator.process_output("Python is great!", belief)
>>> print("[1]" in result.result)  # Has citation
True
calculate_token_overlap(sentence1, sentence2)[source]#

Calculates the percentage of token overlap between two sentences.

This method tokenizes both sentences and calculates the percentage of shared tokens relative to each sentence’s length.

Parameters:
  • sentence1 (str) – First sentence to compare.

  • sentence2 (str) – Second sentence to compare.

Returns:

(overlap_ratio_1, overlap_ratio_2) where each ratio is the

proportion of shared tokens to total tokens in that sentence.

Return type:

tuple

Example

>>> validator = CitationValidation()
>>> ratio1, ratio2 = validator.calculate_token_overlap(
...     "The cat is black",
...     "The cat is white"
... )
>>> print(f"{ratio1:.2f}, {ratio2:.2f}")
'0.75, 0.75'
jaccard_index(sentence1, sentence2)[source]#

Calculates the Jaccard index between two sentences.

This method computes the Jaccard index (intersection over union) between the sets of tokens from both sentences.

Parameters:
  • sentence1 (str) – First sentence to compare.

  • sentence2 (str) – Second sentence to compare.

Returns:

Jaccard similarity score between 0 and 1.

Return type:

float

Example

>>> validator = CitationValidation()
>>> score = validator.jaccard_index(
...     "The cat is black",
...     "The cat is white"
... )
>>> print(f"{score:.2f}")
'0.60'
longest_common_subsequence(text1, text2)[source]#

Calculate length of longest common subsequence.

This method finds the length of the longest subsequence of characters that appear in both texts in the same order.

Parameters:
  • text1 (str) – First text to compare.

  • text2 (str) – Second text to compare.

Returns:

Length of longest common subsequence.

Return type:

int

Example

>>> validator = CitationValidation()
>>> length = validator.longest_common_subsequence(
...     "hello world",
...     "hello there"
... )
>>> print(length)
6
flatten_nested_list(nested_list)[source]#

Flatten a nested list of strings.

Parameters:

nested_list (list[list[str]]) – List of lists of strings.

Returns:

Single list containing all non-empty strings.

Return type:

list[str]

Example

>>> validator = CitationValidation()
>>> flat = validator.flatten_nested_list([["a", "b"], ["c", ""]])
>>> print(flat)
['a', 'b', 'c']
split_paragraph_into_sentences(paragraph)[source]#

Split paragraph into sentences using NLTK.

Parameters:

paragraph (str) – Text to split into sentences.

Returns:

List of sentences from the paragraph.

Return type:

list[str]

Example

>>> validator = CitationValidation()
>>> sentences = validator.split_paragraph_into_sentences(
...     "Hello there. How are you?"
... )
>>> print(sentences)
['Hello there.', 'How are you?']
resources_from_belief(belief)[source]#

Extract resources from belief state actions.

Parameters:

belief (Belief) – Agent’s belief state containing actions.

Returns:

List of resources from retrieval actions.

Return type:

list[ActionResource]

Example

>>> validator = CitationValidation()
>>> belief = Belief()  # Contains retrieval action with resource
>>> resources = validator.resources_from_belief(belief)
>>> print(len(resources))
1
process_output(text, belief, **kwargs)[source]#

Process text and add citations from belief resources.

This method analyzes the input text against resources in the belief state and adds citations where appropriate based on similarity metrics.

Parameters:
  • text (str) – Text to process and add citations to.

  • belief (Belief) – Agent’s belief state containing resources.

  • **kwargs – Additional arguments for processing.

Returns:

Result containing text with citations added.

Return type:

ValidationResult

Example

>>> validator = CitationValidation()
>>> belief = Belief()  # Contains source about "Python"
>>> result = validator.process_output(
...     "Python is a great language.",
...     belief
... )
>>> print("[1]" in result.result)  # Has citation
True
add_citation_to_sentence(sentence, resources)[source]#

Add citations to a single sentence.

This method checks the sentence against each resource using similarity metrics to determine which sources to cite.

Parameters:
  • sentence (str) – Sentence to add citations to.

  • resources (list[ActionResource]) – Available citation sources.

Returns:

a list of citation identifiers citation_links: a list of citation links (URLs)

Return type:

citation_ids

Example

>>> validator = CitationValidation()
>>> resource = ActionResource(
...     source="http://example.com",
...     content="Python is great"
... )
>>> ids, urls = validator.add_citation_to_sentence(
...     "Python is great!",
...     [resource]
... )
>>> print(len(ids), urls[0])
1 http://example.com
format_sentence_with_citations(sentence, ids, links)[source]#

Format a sentence with its citations.

This method adds citation references to the end of a sentence in the format [id](url).

Parameters:
  • sentence (str) – Sentence to add citations to.

  • ids (list[int]) – Citation ID numbers.

  • links (list[str]) – Citation URLs.

Returns:

Sentence with citations added.

Return type:

str

Example

>>> validator = CitationValidation()
>>> result = validator.format_sentence_with_citations(
...     "Python is great.",
...     [1],
...     ["http://example.com"]
... )
>>> print(result)
'Python is great [1](http://example.com).'
add_citations(text, resources)[source]#
Return type:

ValidationResult

get_failure_message()[source]#
Return type:

str

class sherpa_ai.output_parsers.NumberValidation[source]#

Bases: BaseOutputProcessor

Validates the presence or absence of numerical information in a given piece of text.

This class validates that any numbers mentioned in generated text can be found in the source material, helping ensure numerical accuracy and prevent hallucination of numbers.

Example

>>> validator = NumberValidation()
>>> belief = Belief()  # Contains source text with "42 items"
>>> result = validator.process_output("There are 42 items.", belief)
>>> print(result.is_valid)
True
>>> result = validator.process_output("There are 100 items.", belief)
>>> print(result.is_valid)
False
process_output(text, belief, **kwargs)[source]#

Verifies that all numbers within text exist in the belief source text.

Parameters:
  • text (str) – Text containing numbers to validate.

  • belief (Belief) – Agent’s belief state containing source material.

  • **kwargs – Additional arguments for processing.

Returns:

Result indicating whether all numbers are valid,

with feedback if validation fails.

Return type:

ValidationResult

Example

>>> validator = NumberValidation()
>>> belief = Belief()  # Contains "The price is $50"
>>> result = validator.process_output("It costs $50", belief)
>>> print(result.is_valid)
True
>>> print(result.feedback)
''
get_failure_message()[source]#

Get a message describing validation failures.

Returns:

Warning message about potential numerical inaccuracies.

Return type:

str

Example

>>> validator = NumberValidation()
>>> print(validator.get_failure_message())
'The numeric value results might not be fully reliable...'
class sherpa_ai.output_parsers.EntityValidation[source]#

Bases: BaseOutputProcessor

Validator for named entities in text.

This class validates that entities mentioned in generated text can be found in the source material, using progressively more sophisticated similarity comparison methods if initial validation fails.

Example

>>> validator = EntityValidation()
>>> belief = Belief()  # Contains source text about "John Smith"
>>> result = validator.process_output("John Smith is CEO.", belief)
>>> print(result.is_valid)
True
>>> result = validator.process_output("Jane Doe is CEO.", belief)
>>> print(result.is_valid)
False
process_output(text, belief, llm=None, **kwargs)[source]#

Validate entities in text against source material.

This method checks that entities mentioned in the input text can be found in the source material stored in the belief state. It uses increasingly sophisticated comparison methods on validation failures.

Parameters:
  • text (str) – Text containing entities to validate.

  • belief (Belief) – Agent’s belief state containing source material.

  • llm (BaseLanguageModel, optional) – Language model for advanced comparison.

  • **kwargs – Additional arguments for processing.

Returns:

Result indicating whether all entities are valid,

with feedback if validation fails.

Return type:

ValidationResult

Example

>>> validator = EntityValidation()
>>> belief = Belief()  # Contains text about "Microsoft"
>>> result = validator.process_output("Microsoft announced...", belief)
>>> print(result.is_valid)
True
>>> print(result.feedback)
''
similarity_picker(value)[source]#

Select text similarity comparison method.

This method determines which similarity comparison method to use based on the number of previous validation attempts.

Parameters:

value (int) – The iteration count value used to determine the text similarity state. 0: Use BASIC text similarity. 1: Use text similarity BY_METRICS. Default: Use text similarity BY_LLM.

Returns:

Selected comparison method.

Return type:

TextSimilarityMethod

Example

>>> validator = EntityValidation()
>>> method = validator.similarity_picker(0)
>>> print(method)
TextSimilarityMethod.BASIC
>>> method = validator.similarity_picker(2)
>>> print(method)
TextSimilarityMethod.LLM
get_failure_message()[source]#

Get a message describing validation failures.

Returns:

Warning message about potential missing entities.

Return type:

str

Example

>>> validator = EntityValidation()
>>> print(validator.get_failure_message())
'Some enitities from the source might not be mentioned.'
check_entities_match(result, source, stage, llm)[source]#

Check if entities in result match those in source.

This method compares entities between result and source text using the specified similarity comparison method.

Parameters:
  • result (str) – Text containing entities to validate.

  • source (str) – Source text to validate against.

  • stage (TextSimilarityMethod) – Comparison method to use.

  • llm (BaseLanguageModel) – Language model for LLM-based comparison.

Returns:

Whether entities match and error message if not.

Return type:

Tuple[bool, str]

Example

>>> validator = EntityValidation()
>>> match, msg = validator.check_entities_match(
...     "Apple released...",
...     "Apple announced...",
...     TextSimilarityMethod.BASIC,
...     None
... )
>>> print(match)
True