Topics PII & CUI with AI Agents
Series · 6 posts Contact

What is PII & CUI?

Before you can protect sensitive data in an AI agent pipeline, you need a precise vocabulary. PII and CUI are not synonyms, and treating them as such is the fastest path to either over-blocking legitimate work or under-protecting data that regulators care about. This post draws the line clearly — and maps both categories to what an agent actually sees in practice.

PII — Personally Identifiable Information

PII is any information that can be used to distinguish or trace an individual's identity — either alone or when combined with other information that is linked or linkable to a specific individual. This is the definition used by NIST (OMB Circular A-130, NIST SP 800-122) and is the baseline across US federal guidance. The legal framing varies by jurisdiction (GDPR calls it "personal data", CCPA adds "household" scope, HIPAA focuses on health context) but the engineering-level pattern is consistent.

Direct identifiers

These can identify someone without any other data:

Field type Examples Why an agent sees it
Government ID SSN, passport number, driver's license HR files, onboarding docs, form submissions
Contact info Email, phone, home address CRM exports, support tickets, email threads
Financial ID Credit card number, bank account, IBAN Invoice PDFs, payment receipts, order history
Biometric Fingerprint hash, face embedding, voice print Security audit logs, access control exports

Quasi-identifiers

These don't identify someone alone, but can when combined. This is why naive field-level redaction isn't enough — an agent that outputs "Female, 32, Oncology ward, rare condition" may have re-identified a patient even with the name removed.

💡

Latanya Sweeney's landmark study (Carnegie Mellon, 2000) showed that 87% of Americans could be uniquely identified by just ZIP code, date of birth, and sex — using 1990 Census data. The number has shifted with population growth, but the principle holds: quasi-identifiers matter in agent outputs as much as in raw data.

CUI — Controlled Unclassified Information

CUI is a US federal category created by Executive Order 13556 (signed November 4, 2010) and administered by NARA's Information Security Oversight Office (ISOO) as Executive Agent. It covers government-created or government-handled information that requires safeguarding by law, regulation, or government-wide policy — but is not classified. The key word is controlled: access and handling rules are explicit, standardized across all federal agencies, and enforceable on contractors.

The CUI registry defines 20 category groupings. The ones most likely to surface in AI agent work:

CUI Category What it covers Key governing authority
Privacy PII held by federal agencies (9 subcategories) Privacy Act (5 USC 552a), OMB A-130, OMB M-17-12
Export Control ITAR/EAR technical data 22 CFR (ITAR), 15 CFR (EAR); DoS & DoC
Law Enforcement Investigation records, informant data (17 subcategories) 28 CFR, DOJ/FBI agency policy
Legal Attorney-client privileged material, court-protected info (11 subcategories) 5 USC 552, Fed. Rules of Civil Procedure, agency-specific statute
Critical Infrastructure System schematics, vulnerability details (11 subcategories) 6 USC 133, DHS & sector-specific agencies
⚠️

CUI applies to contractors too. If your agent processes documents on behalf of a federal agency or a prime contractor, NIST SP 800-171 Rev. 3 (final May 2024, 97 security requirements) applies to your nonfederal system — even if you're a small SaaS vendor with no direct government relationship. Subcontractors are also in scope.

Where PII and CUI overlap

A document can carry both. A federal employee's personnel file contains PII (name, SSN, address) AND CUI (security clearance level, access records, salary under certain conditions). Your detection pipeline needs to handle both simultaneously — and the handling rules can differ:

PII: minimize exposure
Goal is to prevent identity theft and privacy harm. Pseudonymization often acceptable.
CUI: enforce access controls
Goal is regulatory compliance. Audit trail and access authorization required, not just masking.
Overlap: strictest rule wins
When a field is both PII and CUI, apply both sets of controls — the more restrictive one takes precedence.
Document-level vs field-level
CUI often applies at document level; PII at field level. Your pipeline needs both granularities.

What an AI agent actually sees

An agent operating on a codebase, document set, or API response doesn't receive neatly labelled fields. It sees raw text, and sensitive data appears in unpredictable forms:

example_ticket.txt — what the agent reads
Customer: John Smith ([email protected])
SSN on file: 078-05-1120
Issue: Export shipment delayed — part no. EAR99-X44
Assigned analyst: [REDACTED per LEA-2024-0091]
Notes: Customer mentioned DOD contract #W911NF-24-C-0012

This single ticket contains: PII (name, email, SSN), potential CUI-Export (EAR99 part), CUI-Legal (law enforcement case reference), and CUI-Gov Contracts (DOD contract number). A naïve regex for SSN patterns would catch one field. A well-designed pipeline catches all five.

Detection strategy preview

The next post dives deep into detection. But here's the decision tree that shapes the whole series:

Design principle: classify first, then decide. Run a fast deterministic pass (regex + keyword anchors) to flag candidates, then a model-grounded pass to confirm context-dependent ones. Never redact based solely on pattern match alone — you'll break legitimate data like "SSN" as a column header in a schema file.

Key takeaways

  • PII = any data that identifies a natural person. Governed by privacy law. Handled by minimization and pseudonymization.
  • CUI = government-designated sensitive info that isn't classified. Governed by federal policy. Requires access control and audit trails.
  • A single document can contain both — your pipeline needs both detection modes simultaneously.
  • Quasi-identifiers matter as much as direct identifiers in agent outputs, not just inputs.
  • CUI applies to contractors under NIST SP 800-171, not just federal agencies directly.
← Previous Series Overview 02 — Coming soon →