SHOR — Grounding & Hallucination Classifier
A deterministic, non-LLM classifier that flags ungrounded entities in agent outputs before they reach a tool call or a user. Sub-50ms p99 on 50k-token contexts, zero runtime dependencies, no LLM in the loop.
What SHOR does
SHOR sits between an agent's stated output and the world it is about to act on. Given the agent's text and the context it operated over — tool schemas, retrieved documents, conversation history — SHOR extracts every addressable entity in the output (numbers, identifiers, dates, quoted strings, citations, URLs, proper nouns) and verifies that each one actually appears in the context.
The output is a four-level classification you can gate on. Deterministic on the same input. No model in the loop. The narrow wedge is the point: SHOR catches the specific failure mode of grounded-looking fabrication with high precision, and is honest about what it does not catch.
Install
npm install @reshimu/shor
pip install reshimu-shor
Both packages have zero runtime dependencies. The TypeScript build is a single ESM bundle; the Python package is stdlib-only. No model downloads, no service calls, no telemetry.
Quick start
Same input, two languages. The TypeScript and Python implementations are kept at functional parity — same level, same score, same per-entity verdicts.
// TypeScript import { classify } from '@reshimu/shor' const result = classify({ output: 'Q3 revenue was $4.2M from 47 customers.', context: 'Q3 numbers: 47 customers signed up, revenue of $4.2M for the quarter.', }) // result.level === 'GROUNDED' // result.score === 1 // result.flagForReview === false // result.explanation === 'All extracted entities verified in context.'
# Python from reshimu_shor import classify result = classify( output="Q3 revenue was $4.2M from 47 customers.", context="Q3 numbers: 47 customers signed up, revenue of $4.2M for the quarter.", ) # result.level == 'GROUNDED' # result.score == 1.0 # result.flag_for_review == False # result.explanation == 'All extracted entities verified in context.'
Drop one entity out of the context and the same call returns PARTIAL with the unverified entity surfaced:
const result = classify({ output: 'Q3 revenue was $4.2M from 47 customers.', context: 'Q3 numbers: 47 customers signed up, but revenue was not disclosed.', }) // result.level === 'PARTIAL' // result.score === 0.6666... // result.flagForReview === true // result.explanation === "number '$4.2M' not found in context." for (const entity of result.entities) { console.log(` - "${entity.text}" [${entity.type}] found=${entity.found}`) } // - "Q3" [date] found=true // - "$4.2M" [number] found=false // - "47 customers" [number] found=true
Classification levels
Four levels, three gating behaviors. flagForReview is true for PARTIAL and UNGROUNDED, false for GROUNDED and INDETERMINATE — the last one means SHOR could not check, which is different from SHOR checked and found problems.
What SHOR catches — and what it doesn't
These limits are features. Precise tools that know their scope beat fuzzy tools that pretend to do everything.
Catches:
- Fabricated specific values — dollar figures, percentages, counts, dates that do not appear in context.
- Invented function and method names —
db.fetchAll()when the tool schema only definesdb.query(). - Misquoted strings — quoted text that does not appear verbatim in any source.
- Hallucinated proper nouns — invented names of people, products, or places that appear in the output but not the context.
- Referenced objects, files, or paths that were never in context —
src/lib/util.tswhen no such file appears anywhere upstream.
Does not catch:
- Paraphrased hallucinations — the output rephrases a fabrication so no specific entity is matchable.
- Inferential overreach — extending a true premise to an unsupported conclusion using only words that exist in context.
- Semantic equivalents —
Q3does not matchthird quarter;$4.2Mdoes not matchfour point two million dollars. The number-expansion path is digit-only by design. - Tone, style, sentiment, or values issues.
- Mesa-optimization, deceptive alignment, or other capability-level risks. SHOR is a runtime gate, not an alignment evaluation.
Full reference
The complete reference — performance benchmarks, every entity type's extraction and normalization rules, the no-LLM principle, integration examples for LangGraph and Claude Code, the comparison to alternatives, FAQ, and known edge cases — lives in the SHOR README on GitHub.