Self-consistency with Pytandic Objects using Sherpa

In This Page:

7. Self-consistency with Pytandic Objects using Sherpa#

Self-consistency is a technique that can improve the quality of LLM outputs by considering multiple generation outputs. The original seminal paper on self-consistency focuses on handling simple outputs such as multi-choice answers and numbers. Sherpa extends this idea to arbitrary Pydantic objects.

7.1. Running Self-Consistency#

Let’s say we have the following Pydantic schema:

from pydantic import BaseModel, Field

class Person(BaseModel):
    name: str = Field(..., description="The name of the person")
    age: int = Field(..., description="The age of the person")
    city: str = Field(..., description="The city where the person lives")

And multiple objects have been generated by an LLM on some task that you one to achieve:

objects = [
    Person(name="Alice", age=30, city="New York"),
    Person(name="Alice", age=31, city="New York"),
    Person(name="Alice", age=30, city="Los Angeles"),
]

The current version will identify the most common value for each field of the object, and return a new object with the most common values:

from sherpa_ai.output_parsers.self_consistency import run_self_consistency

# Run self-consistency
result = run_self_consistency(objects, schema=Person)

print(result)
# Output: Person(name='Alice', age=30, city='New York')

7.2. Advanced Configuration#

You can also provide configuration for list attributes using the config parameter:

from sherpa_ai.output_parsers.self_consistency.config import SelfConsistencyConfig, ListConfig

config = SelfConsistencyConfig(
    list_config={
        "tags": ListConfig(strategy="top_k", top_k=2),
        "scores": ListConfig(strategy="threshold", threshold=3.0)
    }
)

result = run_self_consistency(objects, schema=Person, config=config)

For more details on the self-consistency process, you can refer to the self-consistency module documentation.

Note

The future version will support adding relational constraints between the fields, so that the self-consistency can be run for more complex use cases.