Cobjectric: Measuring Parsing Quality for Structured Data

A Python library for fill rate, fill-rate accuracy, and fuzzy similarity on structured payloads such as CV parses or LLM JSON.

TL;DR

Cobjectric scores structured objects with compute_fill_rate, compute_fill_rate_accuracy, and compute_similarity.

It started as a benchmark harness for curriculum vitae (CV) parsing, but the same recipe generalizes to API payloads, configs, migration QA, and any nested dict / JSON you care about.

For Specs, pandas export, and API tables, read docs.

Where this came from

I built Cobjectric because I kept comparing parsed CVs against a schema and a labeled extract. The painful bit is rarely strict equality everywhere. Outputs are almost right: extra spaces, different casing, harmless punctuation, lists in another order, or a field present on one side but missing on the other.

That pattern is not CV-specific. Once you model your payload as a BaseModel, you get repeatable metrics you can log, aggregate, and compare across prompts or pipelines.

Fill rate: how complete is one object?

Fill rate answers a simple question: which fields look filled vs missing for a single instance?

from cobjectric import BaseModel


class Person(BaseModel):
    name: str
    age: int
    email: str


person = Person.from_dict(
    {
        "name": "John Doe",
        "age": 30,
    }
)

result = person.compute_fill_rate()
print(result.fields.name.value)
print(result.fields.age.value)
print(result.fields.email.value)
print(result.mean())
1.0
1.0
0.0
0.667

You land around 66.7% mean completeness when 2 / 3 fields are present.

📝
Note

Think of per-field scores as 1.0 when the field is present and valid, and 0.0 when it is missing or fails validation. If you need weighted summaries, Spec weights apply here too.

Fill rate accuracy: did we miss the same fields?

Fill rate accuracy compares two objects, but still focuses on presence, not semantic equality. That is useful when you want to know whether your extractor skipped the same sections as your reference label.

got = Person.from_dict({"name": "John", "age": 30})
expected = Person.from_dict(
    {
        "name": "Jane",
        "age": 25,
        "email": "[email protected]",
    }
)

accuracy = got.compute_fill_rate_accuracy(expected)
print(accuracy.fields.name.value)
print(accuracy.fields.age.value)
print(accuracy.fields.email.value)
print(accuracy.mean())
1.0
1.0
0.0
0.667

Here 66.7% means 2 / 3 fields share the same filled-or-missing pattern (both sides have name and age, only expected has email).

Similarity: near matches for noisy text

When both sides have text, you usually care about near matches, not character-by-character identity. Think casing edits, spacing, light paraphrases, or abbreviations, not only literal typos.

from cobjectric import BaseModel
from cobjectric.specs import TextSpec


class Article(BaseModel):
    title: str = TextSpec(scorer="WRatio")
    content: str = TextSpec(scorer="WRatio")


reference = Article.from_dict(
    {
        "title": "Introduction to Machine Learning",
        "content": (
            "Machine learning is a subset of artificial intelligence."
        ),
    }
)

parsed = Article.from_dict(
    {
        "title": "Introduction to machine learning",
        "content": "Machine learning is a subset of AI.",
    }
)

similarity = parsed.compute_similarity(reference)
print(similarity.fields.title.value)
print(similarity.fields.content.value)
print(similarity.mean())
1.0
0.8735294117647059
0.9367647058823529

That is the practical win: casing changes can score 100%, while light paraphrases still land around 87.4% on content, so the overall score stays near 93.7%.

Other slots should stay exact once normalization runs: IDs, enums, fixed taxonomy labels, SKUs. KeywordSpec uses exact similarity on those strings (with preprocessing such as stripping whitespace and optional int-to-string coercion), so you do not get partial fuzzy credit when the value must match.

💡
Tip

Use TextSpec for free-form prose so normalization (case, spacing, accents) and RapidFuzz-backed similarity stay consistent. Tune scorer (for example WRatio) when you need stricter or looser fuzzy behavior.

Use KeywordSpec when the contract is effectively "equal or wrong": matching normalized tokens must score 1.0, anything else scores 0.0. For fully custom rules you can still attach similarity_func or use helpers such as exact_similarity from cobjectric.similarity; see Similarity and Pre-defined Specs.

Lists: match items even when order shifts

Models often emit arrays of nested objects. Pairwise index alignment works when order is stable. When it is not, ListCompareStrategy.OPTIMAL_ASSIGNMENT finds a strong one-to-one pairing.

from cobjectric import BaseModel, Spec, ListCompareStrategy
from cobjectric.specs import KeywordSpec


class Skill(BaseModel):
    name: str = KeywordSpec()
    level: str = KeywordSpec()


class Developer(BaseModel):
    skills: list[Skill] = Spec(
        list_compare_strategy=ListCompareStrategy.OPTIMAL_ASSIGNMENT
    )


reference = Developer.from_dict(
    {
        "skills": [
            {"name": "Python", "level": "Expert"},
            {"name": "JavaScript", "level": "Intermediate"},
            {"name": "SQL", "level": "Advanced"},
        ]
    }
)

parsed = Developer.from_dict(
    {
        "skills": [
            {"name": "JavaScript", "level": "Intermediate"},
            {"name": "SQL", "level": "Advanced"},
            {"name": "Python", "level": "Expert"},
        ]
    }
)

similarity = parsed.compute_similarity(reference)
print(similarity.mean())
1.0

100% here means every aligned pair matches on structured fields, even though the incoming list was rotated.

⚠️
Warning

Default pairwise alignment compares index i on both sides. If your generator shuffles sections (skills, roles, bullet lists), pairwise similarity will look unfairly bad even when the content is right. Reach for Levenshtein when order is mostly stable but items insert or drop, or optimal assignment when order is unreliable (SciPy required for that strategy).

Case study: a CV-shaped schema with all three metrics

This mirrors what I wanted first: nested Experience rows plus fuzzy summary text. Assume Experience, CV, reference_cv, and llm_cv match the expanded snippet below.

On this toy pair, completeness lines up, but wording still drifts, which is exactly when you want all three APIs:

fill_only = llm_cv.compute_fill_rate()
presence_match = llm_cv.compute_fill_rate_accuracy(reference_cv)
value_match = llm_cv.compute_similarity(reference_cv)

print(fill_only.mean())
print(presence_match.mean())
print(value_match.mean())
1.0
1.0
0.9755555555555555

Readout: fill rate is 100% because the model output is fully populated. Fill-rate accuracy is also 100% because the same slots are filled on both sides. Similarity is ~97.6% because names and summaries are close, not identical, even when companies and descriptions line up.

Show full CV example (models, payloads, all metrics)
from cobjectric import BaseModel, Spec, ListCompareStrategy
from cobjectric.specs import KeywordSpec, TextSpec


class Experience(BaseModel):
    company: str = KeywordSpec()
    title: str = TextSpec()
    description: str = TextSpec(scorer="WRatio")


class CV(BaseModel):
    name: str = TextSpec()
    summary: str = TextSpec(scorer="WRatio")
    experiences: list[Experience] = Spec(
        list_compare_strategy=ListCompareStrategy.OPTIMAL_ASSIGNMENT
    )


reference_cv = CV.from_dict(
    {
        "name": "Jean-Pierre Dupont",
        "summary": (
            "Senior Software Engineer with 10 years of experience "
            "in Python and ML."
        ),
        "experiences": [
            {
                "company": "TechCorp",
                "title": "Senior Software Engineer",
                "description": (
                    "Led development of ML pipelines. "
                    "Managed team of 5 engineers."
                ),
            }
        ],
    }
)

llm_cv = CV.from_dict(
    {
        "name": "Jean Pierre Dupont",
        "summary": (
            "Senior Software Engineer with 10+ years experience "
            "in Python & ML"
        ),
        "experiences": [
            {
                "company": "TechCorp",
                "title": "Senior Software Engineer",
                "description": (
                    "Led development of ML pipelines.  "
                    "Managed team of 5 engineers."
                ),
            }
        ],
    }
)

fill_only = llm_cv.compute_fill_rate()
presence_match = llm_cv.compute_fill_rate_accuracy(reference_cv)
value_match = llm_cv.compute_similarity(reference_cv)

print(fill_only.mean())
print(presence_match.mean())
print(value_match.fields.name.value)
print(value_match.fields.summary.value)
print(value_match.fields.experiences[0].fields.description.value)
print(value_match.mean())
1.0
1.0
0.9444444444444444
0.9333333333333332
1.0
0.9755555555555555

Beyond hiring CVs

Once the metrics exist, you can reuse them anywhere structured output shows up:

  • API contracts: did we return the same shaped object across versions?
  • LLM benchmarks: swap prompts or models, keep the schema fixed, log means.
  • Data quality: measure completeness before pushing rows downstream.
  • Migration QA: compare legacy vs new serializers field by field.

Built-in Specs (quick map)

If you want batteries included normalizers and similarity defaults, Cobjectric ships KeywordSpec, TextSpec, NumericSpec, BooleanSpec, and DatetimeSpec.

Expand Spec cheat sheet
Spec Good for
KeywordSpec IDs, enums, codes (strip, optional int-to-string coercion)
TextSpec Long prose with normalization + RapidFuzz similarity
NumericSpec JSON number quirks + tolerant similarity
BooleanSpec Loose truthy parsing
DatetimeSpec ISO-ish timestamps with optional tolerance

For field-level weights, custom normalizers, and aggregation helpers, read Pre-defined Specs and Field Specifications.

📝
Limits

Fuzzy scores depend on RapidFuzz and your chosen scorer, so pin versions when you compare runs over time. Optimal assignment for lists needs SciPy. Similarity returns 0.0 when one side is missing while the other is filled, even if the gap is only on one nested field. When you need guarantees beyond strings and tolerances, pair these metrics with schema validation or task-specific checks.

Where to go next

Install from PyPI:

pip install cobjectric

Docs live at cobjectric.nigiva.com (quick start, list strategies, pandas export). Source and issues are on GitHub.