LLM Hallucination Detection: How to Validate AI Responses Before They Reach Users
LLM hallucinations are factually incorrect or fabricated responses that the model generates with high confidence. In enterprise applications, hallucinations are not just an inconvenience. They create legal liability, erode user trust, and can cause real business damage.
Why Hallucinations Happen
Large language models generate text by predicting the most probable next token. They do not have a concept of truth. When the model does not have sufficient information to answer a question accurately, it will often generate a plausible-sounding but incorrect response rather than admitting uncertainty.
Technique 1: RAG Source Verification
If your application uses Retrieval-Augmented Generation (RAG), you can verify the LLM response against the retrieved source documents.
def verify_against_sources(response, sources): claims = extract_claims(response) verified = [] for claim in claims: supported = any( semantic_similarity(claim, source) > 0.85 for source in sources ) verified.append({"claim": claim, "supported": supported}) return verified
The key insight is that every factual claim in the response should be traceable to a specific source document. Claims that cannot be traced are likely hallucinations.
Technique 2: Model-as-Judge
Use a second LLM call (or a different model) to evaluate the accuracy and consistency of the first response.
JUDGE_PROMPT = """ Evaluate the following AI response for factual accuracy. Context provided: {context} Response to evaluate: {response} Rate accuracy from 0 to 1 and list any unsupported claims. """
This technique adds latency but catches subtle hallucinations that pattern matching misses. Use a smaller, faster model for the judge to minimize the overhead.
Technique 3: Schema Enforcement
For structured outputs (JSON, API responses, database queries), enforce a strict schema on the LLM output.
from pydantic import BaseModel, validator class ProductRecommendation(BaseModel): product_id: str name: str confidence: float @validator("product_id") def must_exist_in_catalog(cls, v): if not catalog.exists(v): raise ValueError(f"Product {v} not in catalog") return v
Schema enforcement ensures the model cannot invent product IDs, customer names, or other structured data.
Technique 4: Confidence Scoring
Implement a confidence scoring system that evaluates the model's certainty about its response.
Key signals include:
- Token probabilities (if the model API exposes them)
- Consistency across multiple generation attempts
- Length and specificity of the response
- Presence of hedging language ("I think", "possibly", "it seems")
When confidence drops below a configurable threshold, route the query to a human agent or return a "need more information" response.
Production Architecture
In production, combine these techniques into a pipeline:
- Generate the response
- Run schema validation (instant, catches structural issues)
- Run source verification (fast, catches factual issues in RAG)
- Run model-as-judge for high-stakes queries (adds latency, catches subtle issues)
- Calculate confidence score
- Block or flag low-confidence responses
Conclusion
No single technique eliminates hallucinations entirely. The most effective approach is a layered pipeline that catches different types of errors at different stages. Start with schema enforcement and source verification, which are fast and deterministic. Add model-as-judge for high-stakes use cases where accuracy is critical.
The goal is not perfection. It is ensuring that when hallucinations occur, they are caught before reaching the end user.