Topics PII & CUI with AI Agents
Series · 6 posts Contact

Detection Without Hallucination

PII detection fails in one of two directions: it misses real sensitive data, or it flags benign text as sensitive. Either failure is costly — one creates compliance exposure, the other breaks agent workflows by redacting legitimate content. The solution is a two-pass architecture that combines the speed of deterministic regex with the contextual intelligence of a language model — each doing what it does best.

Why single-method detection fails

Regex-only detection is fast and predictable, but it's blind to context. The pattern \d{3}-\d{2}-\d{4} matches both 078-05-1120 (an actual Social Security Number in a personnel record) and SSN: 000-00-0000 (a placeholder in a schema file) identically. It also matches US phone numbers formatted as 415-55-1234 if your area code check isn't precise. False positive rates in unstructured enterprise documents routinely run 15–25% with regex-only approaches.

LLM-only detection solves context blindness but introduces a different problem: hallucination. When summarizing dense tables, models frequently invent data that wasn't in the source — including PII-shaped strings. More dangerously, when given a long document, models miss PII buried in footnotes, metadata blocks, or non-English text. Their recall on structured data (CSV rows, JSON fields) is especially inconsistent without explicit schema guidance.

⚠️

The "column header" false positive problem: A schema migration file that reads ALTER TABLE users ADD COLUMN ssn VARCHAR(11) will trigger most regex-based and many LLM-based detectors. The word "ssn" appears, but there is no actual Social Security Number in the document. Context windows larger than a single line are required to distinguish field names from field values.

The two-pass architecture

The two-pass approach treats detection as a pipeline, not a single step:

Pass 1: Deterministic sweep
Regex patterns + keyword anchors flag candidate spans. Fast, cheap, high recall, lower precision. Every potential match goes to Pass 2.
Pass 2: Model confirmation
Only flagged candidates are sent to the LLM with surrounding context. The model confirms or rejects each candidate with a confidence score.
Threshold gate
Candidates above the confidence threshold are confirmed detections. Below threshold, they're logged as uncertain and handled per policy.
Entity deduplication
Overlapping spans (e.g., both 'John Smith' and 'Smith' detected) are merged to the most specific match with highest confidence.

Pass 1 handles 100% of the text. Pass 2 only handles the subset that Pass 1 flagged — typically 5–15% of tokens in a mixed enterprise document. This keeps model API costs proportional to actual sensitive content density, not document size.

Regex patterns for common PII types

The table below shows production-grade regex patterns for the most common PII types. Note that all of these require context validation in Pass 2 — they are candidate patterns, not final classifiers.

PII Type Regex Pattern False Positive Rate (no context)
SSN (US) (?<!\d)\d{3}-\d{2}-\d{4}(?!\d) ~18% (phone collisions)
Email [a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,} ~3% (code strings, templates)
Phone (US/intl) (?:\+?1[\s.-]?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4} ~22% (IDs, version numbers)
Credit Card (?:4\d{3}|5[1-5]\d{2}|6011|3[47]\d{2})[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4} ~7% (account numbers)
Passport (US) [A-Z]\d{8} ~31% (product codes, IDs)
IP Address (?:(?:25[0-5]|2\d{2}|[01]?\d\d?)\.) ~5% (version strings)

The structured output schema

Pass 2 sends each candidate to the model with a strict JSON schema response format. Forcing structured output eliminates the main source of LLM unreliability in detection tasks: the model can no longer hedge with vague language or insert extra commentary that breaks downstream parsing.

pii_schema.json — JSON Schema for structured classification response
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["is_pii", "pii_type", "confidence", "reason"],
  "additionalProperties": false,
  "properties": {
    "is_pii": {
      "type": "boolean",
      "description": "True if the span is actual PII/CUI in context, not a placeholder or field name"
    },
    "pii_type": {
      "type": "string",
      "enum": ["SSN", "EMAIL", "PHONE", "CREDIT_CARD", "PASSPORT",
               "IP_ADDRESS", "NAME", "ADDRESS", "DATE_OF_BIRTH",
               "MEDICAL_RECORD", "CUI_EXPORT", "CUI_LAW_ENFORCEMENT",
               "CUI_GOVERNMENT_CONTRACT", "OTHER", "NOT_PII"],
      "description": "Most specific matching type, or NOT_PII if rejected"
    },
    "confidence": {
      "type": "number",
      "minimum": 0.0,
      "maximum": 1.0,
      "description": "Model's confidence that this is real PII/CUI in context"
    },
    "reason": {
      "type": "string",
      "maxLength": 200,
      "description": "One sentence explaining the classification decision"
    }
  }
}

Python implementation: two-pass detector

The following implementation uses Microsoft Presidio for Pass 1 (it bundles production-grade regex patterns and NER) and a Claude API call with structured output for Pass 2. You can substitute any OpenAI-compatible endpoint for the confirmation pass.

detector.py — Two-pass PII detector
from __future__ import annotations
import json
from dataclasses import dataclass, field
from typing import Sequence

import anthropic
from presidio_analyzer import AnalyzerEngine, RecognizerResult

# ── Data model ──────────────────────────────────────────────────────────────

@dataclass
class DetectedEntity:
    text: str
    start: int
    end: int
    pii_type: str
    confidence: float
    reason: str

# ── Pass 1: deterministic sweep via Presidio ─────────────────────────────────

_analyzer = AnalyzerEngine()

def _pass1_candidates(text: str) -> list[RecognizerResult]:
    """Fast regex + NER sweep. High recall, lower precision."""
    return _analyzer.analyze(text=text, language="en")

# ── Pass 2: model-grounded confirmation ──────────────────────────────────────

_client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

_SCHEMA = {
    "name": "pii_classification",
    "description": "Classify whether a text span is real PII/CUI in context",
    "input_schema": {
        "type": "object",
        "required": ["is_pii", "pii_type", "confidence", "reason"],
        "properties": {
            "is_pii": {"type": "boolean"},
            "pii_type": {"type": "string", "enum": [
                "SSN", "EMAIL", "PHONE", "CREDIT_CARD", "PASSPORT",
                "IP_ADDRESS", "NAME", "ADDRESS", "DATE_OF_BIRTH",
                "MEDICAL_RECORD", "CUI_EXPORT", "CUI_LAW_ENFORCEMENT",
                "CUI_GOVERNMENT_CONTRACT", "OTHER", "NOT_PII"
            ]},
            "confidence": {"type": "number", "minimum": 0.0, "maximum": 1.0},
            "reason": {"type": "string"}
        }
    }
}

def _pass2_confirm(
    full_text: str,
    span: str,
    start: int,
    end: int,
    candidate_type: str,
) -> dict:
    """Send candidate to LLM for context-aware confirmation."""
    # Include 200 chars of context on each side
    ctx_start = max(0, start - 200)
    ctx_end = min(len(full_text), end + 200)
    context_snippet = full_text[ctx_start:ctx_end]

    prompt = (
        f"You are a PII/CUI classifier. Given the following text context, "
        f"determine whether the highlighted span is real PII or CUI — "
        f"not a placeholder, field name, or example value.\n\n"
        f"Context:\n---\n{context_snippet}\n---\n\n"
        f"Highlighted span: '{span}'\n"
        f"Candidate type from regex: {candidate_type}\n\n"
        f"Classify the span using the pii_classification tool."
    )

    response = _client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=256,
        tools=[_SCHEMA],
        tool_choice={"type": "tool", "name": "pii_classification"},
        messages=[{"role": "user", "content": prompt}],
    )

    for block in response.content:
        if block.type == "tool_use":
            return block.input
    return {"is_pii": False, "pii_type": "NOT_PII", "confidence": 0.0, "reason": "No tool call"}

# ── Main detector ─────────────────────────────────────────────────────────────

def detect(
    text: str,
    confidence_threshold: float = 0.75,
) -> list[DetectedEntity]:
    """
    Run two-pass PII detection.
    Returns confirmed entities above the confidence threshold.
    """
    candidates = _pass1_candidates(text)
    confirmed: list[DetectedEntity] = []

    for cand in candidates:
        span_text = text[cand.start : cand.end]
        result = _pass2_confirm(
            full_text=text,
            span=span_text,
            start=cand.start,
            end=cand.end,
            candidate_type=cand.entity_type,
        )

        if result["is_pii"] and result["confidence"] >= confidence_threshold:
            confirmed.append(DetectedEntity(
                text=span_text,
                start=cand.start,
                end=cand.end,
                pii_type=result["pii_type"],
                confidence=result["confidence"],
                reason=result["reason"],
            ))

    # Deduplicate overlapping spans: keep highest-confidence
    confirmed.sort(key=lambda e: e.confidence, reverse=True)
    deduped: list[DetectedEntity] = []
    for entity in confirmed:
        overlaps = any(
            not (entity.end <= kept.start or entity.start >= kept.end)
            for kept in deduped
        )
        if not overlaps:
            deduped.append(entity)

    return deduped

Confidence thresholds and tuning

The confidence threshold is the primary tuning knob for the precision/recall tradeoff. There is no universally correct value — it depends on your risk tolerance:

Threshold Behavior Best for
0.60 High recall, more false positives. Aggressive redaction. Healthcare records, government contracts — miss nothing
0.75 Balanced. Most enterprise deployments start here. General enterprise document processing
0.90 High precision, some misses. Minimal false positives. Developer tooling, code analysis, low-risk documents

Tuning tip: Run your detector against a labeled sample of 100–200 documents from your actual data (not synthetic data). Measure precision and recall at threshold values 0.60, 0.70, 0.75, 0.80, and 0.90. The right threshold is the one where your recall meets your compliance floor before your precision falls below what your agents can tolerate. Post 04 covers how to expose this threshold as a YAML config key with CLI override support.

← 01 — What is PII & CUI? Series Overview 03 — Redaction Strategies →