sherpa_ai.output_parsers package

Example

>>> processor = MyProcessor()
>>> result = processor.process_output("hello world")
>>> print(result.valid)
True
>>> print(result.feedback)
'Length check'

sherpa_ai.output_parsers.citation_validation module#

Citation validation and addition module for Sherpa AI.

This module provides functionality for validating and adding citations to text. It defines the CitationValidation class which analyzes text against source materials and adds appropriate citations using various similarity metrics.

class sherpa_ai.output_parsers.citation_validation.CitationValidation(sequence_threshold=0.7, jaccard_threshold=0.7, token_overlap=0.7)[source]#

Validator and citation adder for text content.

This class analyzes text against source materials to validate content and add appropriate citations. It uses multiple similarity metrics to determine when citations are needed and which sources to cite.

sequence_threshold#

Minimum ratio of common subsequence length to text length for citation. Default is 0.7.

Type:: float

jaccard_threshold#

Minimum Jaccard similarity for citation. Default is 0.7.

Type:: float

token_overlap#

Minimum token overlap ratio for citation. Default is 0.7.

Type:: float

Example

>>> validator = CitationValidation(sequence_threshold=0.8)
>>> belief = Belief()  # Contains source about "Python is great"
>>> result = validator.process_output("Python is great!", belief)
>>> print("[1]" in result.result)  # Has citation
True

calculate_token_overlap(sentence1, sentence2)[source]#

Calculates the percentage of token overlap between two sentences.

This method tokenizes both sentences and calculates the percentage of shared tokens relative to each sentence’s length.

Parameters:

sentence1 (str) – First sentence to compare.
sentence2 (str) – Second sentence to compare.

Returns:

(overlap_ratio_1, overlap_ratio_2) where each ratio is the: proportion of shared tokens to total tokens in that sentence.

Return type:

tuple

Example

>>> validator = CitationValidation()
>>> ratio1, ratio2 = validator.calculate_token_overlap(
...     "The cat is black",
...     "The cat is white"
... )
>>> print(f"{ratio1:.2f}, {ratio2:.2f}")
'0.75, 0.75'

jaccard_index(sentence1, sentence2)[source]#

Calculates the Jaccard index between two sentences.

This method computes the Jaccard index (intersection over union) between the sets of tokens from both sentences.

Parameters:

sentence1 (str) – First sentence to compare.
sentence2 (str) – Second sentence to compare.

Returns:

Jaccard similarity score between 0 and 1.

Return type:

float

Example

>>> validator = CitationValidation()
>>> score = validator.jaccard_index(
...     "The cat is black",
...     "The cat is white"
... )
>>> print(f"{score:.2f}")
'0.60'

longest_common_subsequence(text1, text2)[source]#

Calculate length of longest common subsequence.

This method finds the length of the longest subsequence of characters that appear in both texts in the same order.

Parameters:

text1 (str) – First text to compare.
text2 (str) – Second text to compare.

Returns:

Length of longest common subsequence.

Return type:

int

Example

>>> validator = CitationValidation()
>>> length = validator.longest_common_subsequence(
...     "hello world",
...     "hello there"
... )
>>> print(length)
6

flatten_nested_list(nested_list)[source]#

Flatten a nested list of strings.

Parameters:: nested_list (list[list[str]]) – List of lists of strings.
Returns:: Single list containing all non-empty strings.
Return type:: list[str]

Example

>>> validator = CitationValidation()
>>> flat = validator.flatten_nested_list([["a", "b"], ["c", ""]])
>>> print(flat)
['a', 'b', 'c']

split_paragraph_into_sentences(paragraph)[source]#

Split paragraph into sentences using NLTK.

Parameters:: paragraph (str) – Text to split into sentences.
Returns:: List of sentences from the paragraph.
Return type:: list[str]

Example

>>> validator = CitationValidation()
>>> sentences = validator.split_paragraph_into_sentences(
...     "Hello there. How are you?"
... )
>>> print(sentences)
['Hello there.', 'How are you?']

resources_from_belief(belief)[source]#

Extract resources from belief state actions.

Parameters:: belief (Belief) – Agent’s belief state containing actions.
Returns:: List of resources from retrieval actions.
Return type:: list[ActionResource]

Example

>>> validator = CitationValidation()
>>> belief = Belief()  # Contains retrieval action with resource
>>> resources = validator.resources_from_belief(belief)
>>> print(len(resources))
1

process_output(text, belief, **kwargs)[source]#

Process text and add citations from belief resources.

This method analyzes the input text against resources in the belief state and adds citations where appropriate based on similarity metrics.

Parameters:

text (str) – Text to process and add citations to.
belief (Belief) – Agent’s belief state containing resources.
**kwargs – Additional arguments for processing.

Returns:

Result containing text with citations added.

Return type:

Example

>>> validator = CitationValidation()
>>> belief = Belief()  # Contains source about "Python"
>>> result = validator.process_output(
...     "Python is a great language.",
...     belief
... )
>>> print("[1]" in result.result)  # Has citation
True

add_citation_to_sentence(sentence, resources)[source]#

Add citations to a single sentence.

This method checks the sentence against each resource using similarity metrics to determine which sources to cite.

Parameters:

sentence (str) – Sentence to add citations to.
resources (list[ActionResource]) – Available citation sources.

Returns:

a list of citation identifiers citation_links: a list of citation links (URLs)

Return type:

citation_ids

Example

>>> validator = CitationValidation()
>>> resource = ActionResource(
...     source="http://example.com",
...     content="Python is great"
... )
>>> ids, urls = validator.add_citation_to_sentence(
...     "Python is great!",
...     [resource]
... )
>>> print(len(ids), urls[0])
1 http://example.com

format_sentence_with_citations(sentence, ids, links)[source]#

Format a sentence with its citations.

This method adds citation references to the end of a sentence in the format [id](url).

Parameters:

sentence (str) – Sentence to add citations to.
ids (list[int]) – Citation ID numbers.
links (list[str]) – Citation URLs.

Returns:

Sentence with citations added.

Return type:

str

Example

>>> validator = CitationValidation()
>>> result = validator.format_sentence_with_citations(
...     "Python is great.",
...     [1],
...     ["http://example.com"]
... )
>>> print(result)
'Python is great [1](http://example.com).'

add_citations(text, resources)[source]#

Return type:: ValidationResult

get_failure_message()[source]#

Return type:: str

sherpa_ai.output_parsers.link_parse module#

Link parsing and symbol substitution module for Sherpa AI.

This module provides functionality for parsing and transforming links in text. It defines the LinkParser class which can convert between links and symbolic references, maintaining a consistent mapping between them.

class sherpa_ai.output_parsers.link_parse.LinkParser[source]#

Parser for converting between links and symbolic references.

This class handles the conversion of URLs to symbolic references and vice versa, maintaining a consistent mapping between them. It can process both raw URLs and tool-generated output containing links.
Attributes:
links (list): List of unique links encountered during parsing. link_to_id (dict): Mapping of links to their symbolic references. count (int): Counter for generating unique symbol IDs. output_counter (int): Counter for reindexing output symbols. reindex_mapping (dict): Mapping of original IDs to reindexed IDs. url_pattern (str): Regex pattern for identifying links. doc_id_pattern (str): Regex pattern for identifying document IDs. link_symbol (str): Format string for link symbols.

Example:
>>> parser = LinkParser()
>>> text = "Check Link:example.com and Link:test.com"
>>> result = parser.parse_output(text, tool_output=True)
>>> print(result)
'DocID:[1]

DocID:[2] ‘

>>> back = parser.parse_output("[1] and [2]")
>>> print(back)
'<http://example.com|[1]> and <http://test.com|[2]>'

parse_output(text, tool_output=False)[source]#

Parse and transform links in text.

This method either converts URLs to symbolic references (when tool_output is True) or converts symbolic references back to clickable links (when tool_output is False).
Args:
text (str): Text containing either URLs or symbolic references. tool_output (bool): Whether the input is from a tool (True) or

user-facing text (False).

Returns:

str: Text with either URLs converted to symbols or symbols
converted to clickable links.

Example:
>>> parser = LinkParser()
>>> # Convert URLs to symbols
>>> result = parser.parse_output("Link:example.com", tool_output=True)
>>> print(result)
'DocID:[1]

‘

>>> # Convert symbols back to links
>>> result = parser.parse_output("[1]")
>>> print(result)
'<http://example.com|[1]>'

Return type:: str

sherpa_ai.output_parsers.md_to_slack_parse module#

Markdown to Slack format conversion module for Sherpa AI.

This module provides functionality for converting Markdown-formatted text to Slack-compatible format. It defines the MDToSlackParse class which handles the conversion of Markdown links to Slack’s link format.

class sherpa_ai.output_parsers.md_to_slack_parse.MDToSlackParse[source]#

Parser for converting Markdown links to Slack format.

This class converts Markdown-style links ([text](url)) to Slack’s link format (<url|text>). It maintains the link text and URL while changing only the syntax to match Slack’s requirements.

pattern#

Regex pattern for identifying Markdown links.

Type:: str

Example

>>> parser = MDToSlackParse()
>>> text = "Check out [this link](http://example.com)!"
>>> result = parser.parse_output(text)
>>> print(result)
'Check out <http://example.com|this link>!'

parse_output(text)[source]#

Convert Markdown links to Slack format.

This method finds all Markdown-style links in the input text and converts them to Slack’s link format while preserving the link text and URL.

Parameters:: text (str) – Text containing Markdown-style links.
Returns:: Text with links converted to Slack format.
Return type:: str

Example

>>> parser = MDToSlackParse()
>>> text = "See [docs](https://docs.com) and [code](https://code.com)"
>>> result = parser.parse_output(text)
>>> print(result)
'See <https://docs.com|docs> and <https://code.com|code>'

sherpa_ai.output_parsers.number_validation module#

Number validation module for Sherpa AI.

This module provides functionality for validating numerical information in text. It defines the NumberValidation class which verifies that numbers mentioned in generated text exist in the source material.

class sherpa_ai.output_parsers.number_validation.NumberValidation[source]#

Validates the presence or absence of numerical information in a given piece of text.

This class validates that any numbers mentioned in generated text can be found in the source material, helping ensure numerical accuracy and prevent hallucination of numbers.

Example

>>> validator = NumberValidation()
>>> belief = Belief()  # Contains source text with "42 items"
>>> result = validator.process_output("There are 42 items.", belief)
>>> print(result.is_valid)
True
>>> result = validator.process_output("There are 100 items.", belief)
>>> print(result.is_valid)
False

process_output(text, belief, **kwargs)[source]#

Verifies that all numbers within text exist in the belief source text.

Parameters:

text (str) – Text containing numbers to validate.
belief (Belief) – Agent’s belief state containing source material.
**kwargs – Additional arguments for processing.

Returns:

Result indicating whether all numbers are valid,: with feedback if validation fails.

Return type:

Example

>>> validator = NumberValidation()
>>> belief = Belief()  # Contains "The price is $50"
>>> result = validator.process_output("It costs $50", belief)
>>> print(result.is_valid)
True
>>> print(result.feedback)
''

get_failure_message()[source]#

Get a message describing validation failures.

Returns:: Warning message about potential numerical inaccuracies.
Return type:: str

Example

>>> validator = NumberValidation()
>>> print(validator.get_failure_message())
'The numeric value results might not be fully reliable...'

sherpa_ai.output_parsers.validation_result module#

Validation result model for Sherpa AI output processors.

This module provides the ValidationResult class which represents the outcome of content validation operations. It includes status, result content, and optional feedback information.

class sherpa_ai.output_parsers.validation_result.ValidationResult(**data)[source]#

Bases: BaseModel

Result of content validation operations.

This class represents the outcome of validating content, including whether the validation passed, the processed content, and any feedback about the validation process.

is_valid#

Whether the validation passed (True) or failed (False).

Type:: bool

result#

The processed or validated content.

Type:: str

feedback#

Additional information about the validation result.

Type:: str

Example

>>> result = ValidationResult(
...     is_valid=True,
...     result="Validated text",
...     feedback="All checks passed"
... )
>>> print(result.is_valid)
True
>>> print(result.feedback)
'All checks passed'

is_valid: bool#

result: str#

feedback: str#

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sherpa_ai.output_parsers.self_consistency module#

class sherpa_ai.output_parsers.self_consistency.MaximumLikelihoodConcretizer(config=None)[source]#

Bases: Concretizer

concretize(abstract_object, return_dict=False)[source]#

Concretize an abstract object by selecting the most likely value for each attribute. For list attributes, uses top-k or threshold-based selection.

Parameters:: abstract_object (AbstractObject) – The abstract object to concretize.
Returns:: A concrete object with the most likely values for each attribute.
Return type:: BaseModel

class sherpa_ai.output_parsers.self_consistency.ObjectAggregator(obj_schema, *, value_weight_map: dict[str, dict | float] = {}, obj_dict: dict[str, list | dict] = {})[source]#

Bases: BaseModel

Class representing an aggregation of objects by capture their attributes values as a list

obj_schema: type[BaseModel]#: Schema of the object, used to validate the object.

value_weight_map: dict[str, dict | float]#: Dictionary mapping each field to a dictionary of values and their weights. If the field is a primitive type, it will be mapped to a dictionary with values as keys and their weights as values. If the field is a nested model, it will be mapped to a dictionary for storing object values. The default weight is 1.0.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

obj_dict: dict[str, list | dict]#: Dictionary representing the object aggregator, where each field is mapped to a list or a dictionary. If the field is a primitive type, it will be mapped to a list for storing object values. If the field is a nested model, it will be mapped to a dictionary for storing object values.

add_object(obj)[source]#

Add an object to the aggregation of objects.

Parameters:: obj (BaseModel) – The object to add, must conform to the schema defined by obj_schema.

class sherpa_ai.output_parsers.self_consistency.AbstractObject(**data)[source]#

Bases: BaseModel

Abstract object that maps each attribute into a distribution

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

obj_schema: type[BaseModel]#: Schema of the object, used to validate the object.

obj_dict: dict[str, Distribution | dict]#

classmethod from_aggregator(obj_aggregator)[source]#

Create an AbstractObject from an ObjectAggregator.

Parameters:: obj_aggregator (ObjectAggregator) – The ObjectAggregator to convert.
Returns:: An instance of AbstractObject with the aggregated data.
Return type:: AbstractObject

sherpa_ai.output_parsers.self_consistency.run_self_consistency(objects, schema, aggregator_cls=<class 'sherpa_ai.output_parsers.self_consistency.object_aggregator.ObjectAggregator'>, concretizer=None, value_weight_map={}, config=None)[source]#

Run self-consistency on a list of objects using the provided schema and configuration.

Parameters:

objects (list[BaseModel]) – List of objects to process.
schema (type[BaseModel]) – Pydantic schema for validation.
aggregator_cls (type[ObjectAggregator], optional) – Class to use for aggregation. Defaults to ObjectAggregator.
concretizer (Optional[Concretizer], optional) – Concretizer to use for final output. Defaults to MaximumLikelihoodConcretizer.
value_weight_map (dict[str, Union[dict, float]], optional) – Weight map for each attribute of the object. Defaults to {}.
config (Optional[SelfConsistencyConfig], optional) – Configuration for self-consistency processing. If None, default configuration will be used.

Returns:

The final concrete object after self-consistency processing (instance of schema).

Return type:

BaseModel

class sherpa_ai.output_parsers.self_consistency.SelfConsistencyConfig(**data)[source]#

Bases: BaseModel

Configuration for self-consistency processing.

This class provides a structured way to configure self-consistency behavior, particularly for list attributes. It replaces the previous dict-based configuration approach.

list_config#: Dictionary mapping field names to their list processing configurations. Defaults to an empty dictionary.

list_config: dict[str, ListConfig]#

get_list_config(field_path)[source]#

Get the list configuration for a specific field path.

Parameters:: field_path (str) – The path to the field (e.g., “tags” or “nested.field”)
Returns:: The configuration for the field, or default if not specified
Return type:: ListConfig

has_list_config(field_path)[source]#

Check if a field has specific list configuration.

Parameters:: field_path (str) – The path to the field
Returns:: True if the field has specific configuration, False otherwise
Return type:: bool

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class sherpa_ai.output_parsers.self_consistency.ListConfig(**data)[source]#

Bases: BaseModel

Configuration for individual list attributes in self-consistency processing.

This class defines how list attributes should be processed during the self-consistency aggregation and concretization process.

top_k#: Number of top items to select when using “top_k” strategy. Defaults to 0 (which means use default behavior).

threshold#: Minimum frequency threshold when using “threshold” strategy. Defaults to 2.0.

strategy#: The strategy to use for selecting items from list attributes. Either “top_k” or “threshold”. Defaults to “top_k”.

top_k: int#

threshold: float#

strategy: Literal['top_k', 'threshold']#

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Module contents#

Output parsing and validation module for Sherpa AI.

This module provides various parsers and validators for processing model outputs. It includes parsers for links, Markdown to Slack conversion, and validators for citations, numbers, and entities.

Example

>>> from sherpa_ai.output_parsers import LinkParser, NumberValidation
>>> link_parser = LinkParser()
>>> links = link_parser.parse("Check out https://example.com")
>>> number_validator = NumberValidation()
>>> result = number_validator.validate("The answer is 42")

class sherpa_ai.output_parsers.LinkParser[source]#

Parser for converting between links and symbolic references.

This class handles the conversion of URLs to symbolic references and vice versa, maintaining a consistent mapping between them. It can process both raw URLs and tool-generated output containing links.
Attributes:
links (list): List of unique links encountered during parsing. link_to_id (dict): Mapping of links to their symbolic references. count (int): Counter for generating unique symbol IDs. output_counter (int): Counter for reindexing output symbols. reindex_mapping (dict): Mapping of original IDs to reindexed IDs. url_pattern (str): Regex pattern for identifying links. doc_id_pattern (str): Regex pattern for identifying document IDs. link_symbol (str): Format string for link symbols.

Example:
>>> parser = LinkParser()
>>> text = "Check Link:example.com and Link:test.com"
>>> result = parser.parse_output(text, tool_output=True)
>>> print(result)
'DocID:[1]

DocID:[2] ‘

>>> back = parser.parse_output("[1] and [2]")
>>> print(back)
'<http://example.com|[1]> and <http://test.com|[2]>'

parse_output(text, tool_output=False)[source]#

Parse and transform links in text.

This method either converts URLs to symbolic references (when tool_output is True) or converts symbolic references back to clickable links (when tool_output is False).
Args:
text (str): Text containing either URLs or symbolic references. tool_output (bool): Whether the input is from a tool (True) or

user-facing text (False).

Returns:

str: Text with either URLs converted to symbols or symbols
converted to clickable links.

Example:
>>> parser = LinkParser()
>>> # Convert URLs to symbols
>>> result = parser.parse_output("Link:example.com", tool_output=True)
>>> print(result)
'DocID:[1]

‘

>>> # Convert symbols back to links
>>> result = parser.parse_output("[1]")
>>> print(result)
'<http://example.com|[1]>'

Return type:: str

class sherpa_ai.output_parsers.MDToSlackParse[source]#

Parser for converting Markdown links to Slack format.

This class converts Markdown-style links ([text](url)) to Slack’s link format (<url|text>). It maintains the link text and URL while changing only the syntax to match Slack’s requirements.

pattern#

Regex pattern for identifying Markdown links.

Type:: str

Example

>>> parser = MDToSlackParse()
>>> text = "Check out [this link](http://example.com)!"
>>> result = parser.parse_output(text)
>>> print(result)
'Check out <http://example.com|this link>!'

parse_output(text)[source]#

Convert Markdown links to Slack format.

This method finds all Markdown-style links in the input text and converts them to Slack’s link format while preserving the link text and URL.

Parameters:: text (str) – Text containing Markdown-style links.
Returns:: Text with links converted to Slack format.
Return type:: str

Example

>>> parser = MDToSlackParse()
>>> text = "See [docs](https://docs.com) and [code](https://code.com)"
>>> result = parser.parse_output(text)
>>> print(result)
'See <https://docs.com|docs> and <https://code.com|code>'

class sherpa_ai.output_parsers.CitationValidation(sequence_threshold=0.7, jaccard_threshold=0.7, token_overlap=0.7)[source]#

Validator and citation adder for text content.

This class analyzes text against source materials to validate content and add appropriate citations. It uses multiple similarity metrics to determine when citations are needed and which sources to cite.

sequence_threshold#

Minimum ratio of common subsequence length to text length for citation. Default is 0.7.

Type:: float

jaccard_threshold#

Minimum Jaccard similarity for citation. Default is 0.7.

Type:: float

token_overlap#

Minimum token overlap ratio for citation. Default is 0.7.

Type:: float

Example

>>> validator = CitationValidation(sequence_threshold=0.8)
>>> belief = Belief()  # Contains source about "Python is great"
>>> result = validator.process_output("Python is great!", belief)
>>> print("[1]" in result.result)  # Has citation
True

calculate_token_overlap(sentence1, sentence2)[source]#

Calculates the percentage of token overlap between two sentences.

This method tokenizes both sentences and calculates the percentage of shared tokens relative to each sentence’s length.

Parameters:

sentence1 (str) – First sentence to compare.
sentence2 (str) – Second sentence to compare.

Returns:

(overlap_ratio_1, overlap_ratio_2) where each ratio is the: proportion of shared tokens to total tokens in that sentence.

Return type:

tuple

Example

>>> validator = CitationValidation()
>>> ratio1, ratio2 = validator.calculate_token_overlap(
...     "The cat is black",
...     "The cat is white"
... )
>>> print(f"{ratio1:.2f}, {ratio2:.2f}")
'0.75, 0.75'

jaccard_index(sentence1, sentence2)[source]#

Calculates the Jaccard index between two sentences.

This method computes the Jaccard index (intersection over union) between the sets of tokens from both sentences.

Parameters:

sentence1 (str) – First sentence to compare.
sentence2 (str) – Second sentence to compare.

Returns:

Jaccard similarity score between 0 and 1.

Return type:

float

Example

>>> validator = CitationValidation()
>>> score = validator.jaccard_index(
...     "The cat is black",
...     "The cat is white"
... )
>>> print(f"{score:.2f}")
'0.60'

longest_common_subsequence(text1, text2)[source]#

Calculate length of longest common subsequence.

This method finds the length of the longest subsequence of characters that appear in both texts in the same order.

Parameters:

text1 (str) – First text to compare.
text2 (str) – Second text to compare.

Returns:

Length of longest common subsequence.

Return type:

int

Example

>>> validator = CitationValidation()
>>> length = validator.longest_common_subsequence(
...     "hello world",
...     "hello there"
... )
>>> print(length)
6

flatten_nested_list(nested_list)[source]#

Flatten a nested list of strings.

Parameters:: nested_list (list[list[str]]) – List of lists of strings.
Returns:: Single list containing all non-empty strings.
Return type:: list[str]

Example

>>> validator = CitationValidation()
>>> flat = validator.flatten_nested_list([["a", "b"], ["c", ""]])
>>> print(flat)
['a', 'b', 'c']

split_paragraph_into_sentences(paragraph)[source]#

Split paragraph into sentences using NLTK.

Parameters:: paragraph (str) – Text to split into sentences.
Returns:: List of sentences from the paragraph.
Return type:: list[str]

Example

>>> validator = CitationValidation()
>>> sentences = validator.split_paragraph_into_sentences(
...     "Hello there. How are you?"
... )
>>> print(sentences)
['Hello there.', 'How are you?']

resources_from_belief(belief)[source]#

Extract resources from belief state actions.

Parameters:: belief (Belief) – Agent’s belief state containing actions.
Returns:: List of resources from retrieval actions.
Return type:: list[ActionResource]

Example

>>> validator = CitationValidation()
>>> belief = Belief()  # Contains retrieval action with resource
>>> resources = validator.resources_from_belief(belief)
>>> print(len(resources))
1

process_output(text, belief, **kwargs)[source]#

Process text and add citations from belief resources.

This method analyzes the input text against resources in the belief state and adds citations where appropriate based on similarity metrics.

Parameters:

text (str) – Text to process and add citations to.
belief (Belief) – Agent’s belief state containing resources.
**kwargs – Additional arguments for processing.

Returns:

Result containing text with citations added.

Return type:

Example

>>> validator = CitationValidation()
>>> belief = Belief()  # Contains source about "Python"
>>> result = validator.process_output(
...     "Python is a great language.",
...     belief
... )
>>> print("[1]" in result.result)  # Has citation
True

add_citation_to_sentence(sentence, resources)[source]#

Add citations to a single sentence.

This method checks the sentence against each resource using similarity metrics to determine which sources to cite.

Parameters:

sentence (str) – Sentence to add citations to.
resources (list[ActionResource]) – Available citation sources.

Returns:

a list of citation identifiers citation_links: a list of citation links (URLs)

Return type:

citation_ids

Example

>>> validator = CitationValidation()
>>> resource = ActionResource(
...     source="http://example.com",
...     content="Python is great"
... )
>>> ids, urls = validator.add_citation_to_sentence(
...     "Python is great!",
...     [resource]
... )
>>> print(len(ids), urls[0])
1 http://example.com

format_sentence_with_citations(sentence, ids, links)[source]#

Format a sentence with its citations.

This method adds citation references to the end of a sentence in the format [id](url).

Parameters:

sentence (str) – Sentence to add citations to.
ids (list[int]) – Citation ID numbers.
links (list[str]) – Citation URLs.

Returns:

Sentence with citations added.

Return type:

str

Example

>>> validator = CitationValidation()
>>> result = validator.format_sentence_with_citations(
...     "Python is great.",
...     [1],
...     ["http://example.com"]
... )
>>> print(result)
'Python is great [1](http://example.com).'

add_citations(text, resources)[source]#

Return type:: ValidationResult

get_failure_message()[source]#

Return type:: str

class sherpa_ai.output_parsers.NumberValidation[source]#

Validates the presence or absence of numerical information in a given piece of text.

This class validates that any numbers mentioned in generated text can be found in the source material, helping ensure numerical accuracy and prevent hallucination of numbers.

Example

>>> validator = NumberValidation()
>>> belief = Belief()  # Contains source text with "42 items"
>>> result = validator.process_output("There are 42 items.", belief)
>>> print(result.is_valid)
True
>>> result = validator.process_output("There are 100 items.", belief)
>>> print(result.is_valid)
False

process_output(text, belief, **kwargs)[source]#

Verifies that all numbers within text exist in the belief source text.

Parameters:

text (str) – Text containing numbers to validate.
belief (Belief) – Agent’s belief state containing source material.
**kwargs – Additional arguments for processing.

Returns:

Result indicating whether all numbers are valid,: with feedback if validation fails.

Return type:

Example

>>> validator = NumberValidation()
>>> belief = Belief()  # Contains "The price is $50"
>>> result = validator.process_output("It costs $50", belief)
>>> print(result.is_valid)
True
>>> print(result.feedback)
''

get_failure_message()[source]#

Get a message describing validation failures.

Returns:: Warning message about potential numerical inaccuracies.
Return type:: str

Example

>>> validator = NumberValidation()
>>> print(validator.get_failure_message())
'The numeric value results might not be fully reliable...'

class sherpa_ai.output_parsers.EntityValidation[source]#

Validator for named entities in text.

This class validates that entities mentioned in generated text can be found in the source material, using progressively more sophisticated similarity comparison methods if initial validation fails.

Example

>>> validator = EntityValidation()
>>> belief = Belief()  # Contains source text about "John Smith"
>>> result = validator.process_output("John Smith is CEO.", belief)
>>> print(result.is_valid)
True
>>> result = validator.process_output("Jane Doe is CEO.", belief)
>>> print(result.is_valid)
False

process_output(text, belief, llm=None, **kwargs)[source]#

Validate entities in text against source material.

This method checks that entities mentioned in the input text can be found in the source material stored in the belief state. It uses increasingly sophisticated comparison methods on validation failures.

Parameters:

text (str) – Text containing entities to validate.
belief (Belief) – Agent’s belief state containing source material.
llm (BaseLanguageModel, optional) – Language model for advanced comparison.
**kwargs – Additional arguments for processing.

Returns:

Result indicating whether all entities are valid,: with feedback if validation fails.

Return type: