Detection Without Hallucination
Why single-method detection fails
Regex-only detection is fast and predictable, but it's blind to context. The pattern
\d{3}-\d{2}-\d{4} matches both 078-05-1120 (an actual Social
Security Number in a personnel record) and SSN: 000-00-0000 (a placeholder in
a schema file) identically. It also matches US phone numbers formatted as
415-55-1234 if your area code check isn't precise. False positive rates in
unstructured enterprise documents routinely run 15–25% with regex-only approaches.
LLM-only detection solves context blindness but introduces a different problem: hallucination. When summarizing dense tables, models frequently invent data that wasn't in the source — including PII-shaped strings. More dangerously, when given a long document, models miss PII buried in footnotes, metadata blocks, or non-English text. Their recall on structured data (CSV rows, JSON fields) is especially inconsistent without explicit schema guidance.
The "column header" false positive problem: A schema migration file that
reads ALTER TABLE users ADD COLUMN ssn VARCHAR(11) will trigger most
regex-based and many LLM-based detectors. The word "ssn" appears, but there is no actual
Social Security Number in the document. Context windows larger than a single line are
required to distinguish field names from field values.
The two-pass architecture
The two-pass approach treats detection as a pipeline, not a single step:
Pass 1 handles 100% of the text. Pass 2 only handles the subset that Pass 1 flagged — typically 5–15% of tokens in a mixed enterprise document. This keeps model API costs proportional to actual sensitive content density, not document size.
Regex patterns for common PII types
The table below shows production-grade regex patterns for the most common PII types. Note that all of these require context validation in Pass 2 — they are candidate patterns, not final classifiers.
| PII Type | Regex Pattern | False Positive Rate (no context) |
|---|---|---|
| SSN (US) | (?<!\d)\d{3}-\d{2}-\d{4}(?!\d) | ~18% (phone collisions) |
[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,} | ~3% (code strings, templates) | |
| Phone (US/intl) | (?:\+?1[\s.-]?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4} | ~22% (IDs, version numbers) |
| Credit Card | (?:4\d{3}|5[1-5]\d{2}|6011|3[47]\d{2})[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4} | ~7% (account numbers) |
| Passport (US) | [A-Z]\d{8} | ~31% (product codes, IDs) |
| IP Address | (?:(?:25[0-5]|2\d{2}|[01]?\d\d?)\.) | ~5% (version strings) |
The structured output schema
Pass 2 sends each candidate to the model with a strict JSON schema response format. Forcing structured output eliminates the main source of LLM unreliability in detection tasks: the model can no longer hedge with vague language or insert extra commentary that breaks downstream parsing.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["is_pii", "pii_type", "confidence", "reason"],
"additionalProperties": false,
"properties": {
"is_pii": {
"type": "boolean",
"description": "True if the span is actual PII/CUI in context, not a placeholder or field name"
},
"pii_type": {
"type": "string",
"enum": ["SSN", "EMAIL", "PHONE", "CREDIT_CARD", "PASSPORT",
"IP_ADDRESS", "NAME", "ADDRESS", "DATE_OF_BIRTH",
"MEDICAL_RECORD", "CUI_EXPORT", "CUI_LAW_ENFORCEMENT",
"CUI_GOVERNMENT_CONTRACT", "OTHER", "NOT_PII"],
"description": "Most specific matching type, or NOT_PII if rejected"
},
"confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Model's confidence that this is real PII/CUI in context"
},
"reason": {
"type": "string",
"maxLength": 200,
"description": "One sentence explaining the classification decision"
}
}
} Python implementation: two-pass detector
The following implementation uses Microsoft Presidio for Pass 1 (it bundles production-grade regex patterns and NER) and a Claude API call with structured output for Pass 2. You can substitute any OpenAI-compatible endpoint for the confirmation pass.
from __future__ import annotations
import json
from dataclasses import dataclass, field
from typing import Sequence
import anthropic
from presidio_analyzer import AnalyzerEngine, RecognizerResult
# ── Data model ──────────────────────────────────────────────────────────────
@dataclass
class DetectedEntity:
text: str
start: int
end: int
pii_type: str
confidence: float
reason: str
# ── Pass 1: deterministic sweep via Presidio ─────────────────────────────────
_analyzer = AnalyzerEngine()
def _pass1_candidates(text: str) -> list[RecognizerResult]:
"""Fast regex + NER sweep. High recall, lower precision."""
return _analyzer.analyze(text=text, language="en")
# ── Pass 2: model-grounded confirmation ──────────────────────────────────────
_client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
_SCHEMA = {
"name": "pii_classification",
"description": "Classify whether a text span is real PII/CUI in context",
"input_schema": {
"type": "object",
"required": ["is_pii", "pii_type", "confidence", "reason"],
"properties": {
"is_pii": {"type": "boolean"},
"pii_type": {"type": "string", "enum": [
"SSN", "EMAIL", "PHONE", "CREDIT_CARD", "PASSPORT",
"IP_ADDRESS", "NAME", "ADDRESS", "DATE_OF_BIRTH",
"MEDICAL_RECORD", "CUI_EXPORT", "CUI_LAW_ENFORCEMENT",
"CUI_GOVERNMENT_CONTRACT", "OTHER", "NOT_PII"
]},
"confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
"reason": {"type": "string"}
}
}
}
def _pass2_confirm(
full_text: str,
span: str,
start: int,
end: int,
candidate_type: str,
) -> dict:
"""Send candidate to LLM for context-aware confirmation."""
# Include 200 chars of context on each side
ctx_start = max(0, start - 200)
ctx_end = min(len(full_text), end + 200)
context_snippet = full_text[ctx_start:ctx_end]
prompt = (
f"You are a PII/CUI classifier. Given the following text context, "
f"determine whether the highlighted span is real PII or CUI — "
f"not a placeholder, field name, or example value.\n\n"
f"Context:\n---\n{context_snippet}\n---\n\n"
f"Highlighted span: '{span}'\n"
f"Candidate type from regex: {candidate_type}\n\n"
f"Classify the span using the pii_classification tool."
)
response = _client.messages.create(
model="claude-sonnet-4-5",
max_tokens=256,
tools=[_SCHEMA],
tool_choice={"type": "tool", "name": "pii_classification"},
messages=[{"role": "user", "content": prompt}],
)
for block in response.content:
if block.type == "tool_use":
return block.input
return {"is_pii": False, "pii_type": "NOT_PII", "confidence": 0.0, "reason": "No tool call"}
# ── Main detector ─────────────────────────────────────────────────────────────
def detect(
text: str,
confidence_threshold: float = 0.75,
) -> list[DetectedEntity]:
"""
Run two-pass PII detection.
Returns confirmed entities above the confidence threshold.
"""
candidates = _pass1_candidates(text)
confirmed: list[DetectedEntity] = []
for cand in candidates:
span_text = text[cand.start : cand.end]
result = _pass2_confirm(
full_text=text,
span=span_text,
start=cand.start,
end=cand.end,
candidate_type=cand.entity_type,
)
if result["is_pii"] and result["confidence"] >= confidence_threshold:
confirmed.append(DetectedEntity(
text=span_text,
start=cand.start,
end=cand.end,
pii_type=result["pii_type"],
confidence=result["confidence"],
reason=result["reason"],
))
# Deduplicate overlapping spans: keep highest-confidence
confirmed.sort(key=lambda e: e.confidence, reverse=True)
deduped: list[DetectedEntity] = []
for entity in confirmed:
overlaps = any(
not (entity.end <= kept.start or entity.start >= kept.end)
for kept in deduped
)
if not overlaps:
deduped.append(entity)
return deduped Confidence thresholds and tuning
The confidence threshold is the primary tuning knob for the precision/recall tradeoff. There is no universally correct value — it depends on your risk tolerance:
| Threshold | Behavior | Best for |
|---|---|---|
| 0.60 | High recall, more false positives. Aggressive redaction. | Healthcare records, government contracts — miss nothing |
| 0.75 | Balanced. Most enterprise deployments start here. | General enterprise document processing |
| 0.90 | High precision, some misses. Minimal false positives. | Developer tooling, code analysis, low-risk documents |
Tuning tip: Run your detector against a labeled sample of 100–200 documents from your actual data (not synthetic data). Measure precision and recall at threshold values 0.60, 0.70, 0.75, 0.80, and 0.90. The right threshold is the one where your recall meets your compliance floor before your precision falls below what your agents can tolerate. Post 04 covers how to expose this threshold as a YAML config key with CLI override support.