What is PII & CUI?
PII — Personally Identifiable Information
PII is any information that can be used to distinguish or trace an individual's identity — either alone or when combined with other information that is linked or linkable to a specific individual. This is the definition used by NIST (OMB Circular A-130, NIST SP 800-122) and is the baseline across US federal guidance. The legal framing varies by jurisdiction (GDPR calls it "personal data", CCPA adds "household" scope, HIPAA focuses on health context) but the engineering-level pattern is consistent.
Direct identifiers
These can identify someone without any other data:
| Field type | Examples | Why an agent sees it |
|---|---|---|
| Government ID | SSN, passport number, driver's license | HR files, onboarding docs, form submissions |
| Contact info | Email, phone, home address | CRM exports, support tickets, email threads |
| Financial ID | Credit card number, bank account, IBAN | Invoice PDFs, payment receipts, order history |
| Biometric | Fingerprint hash, face embedding, voice print | Security audit logs, access control exports |
Quasi-identifiers
These don't identify someone alone, but can when combined. This is why naive field-level redaction isn't enough — an agent that outputs "Female, 32, Oncology ward, rare condition" may have re-identified a patient even with the name removed.
Latanya Sweeney's landmark study (Carnegie Mellon, 2000) showed that 87% of Americans could be uniquely identified by just ZIP code, date of birth, and sex — using 1990 Census data. The number has shifted with population growth, but the principle holds: quasi-identifiers matter in agent outputs as much as in raw data.
CUI — Controlled Unclassified Information
CUI is a US federal category created by Executive Order 13556 (signed November 4, 2010) and administered by NARA's Information Security Oversight Office (ISOO) as Executive Agent. It covers government-created or government-handled information that requires safeguarding by law, regulation, or government-wide policy — but is not classified. The key word is controlled: access and handling rules are explicit, standardized across all federal agencies, and enforceable on contractors.
The CUI registry defines 20 category groupings. The ones most likely to surface in AI agent work:
| CUI Category | What it covers | Key governing authority |
|---|---|---|
| Privacy | PII held by federal agencies (9 subcategories) | Privacy Act (5 USC 552a), OMB A-130, OMB M-17-12 |
| Export Control | ITAR/EAR technical data | 22 CFR (ITAR), 15 CFR (EAR); DoS & DoC |
| Law Enforcement | Investigation records, informant data (17 subcategories) | 28 CFR, DOJ/FBI agency policy |
| Legal | Attorney-client privileged material, court-protected info (11 subcategories) | 5 USC 552, Fed. Rules of Civil Procedure, agency-specific statute |
| Critical Infrastructure | System schematics, vulnerability details (11 subcategories) | 6 USC 133, DHS & sector-specific agencies |
CUI applies to contractors too. If your agent processes documents on behalf of a federal agency or a prime contractor, NIST SP 800-171 Rev. 3 (final May 2024, 97 security requirements) applies to your nonfederal system — even if you're a small SaaS vendor with no direct government relationship. Subcontractors are also in scope.
Where PII and CUI overlap
A document can carry both. A federal employee's personnel file contains PII (name, SSN, address) AND CUI (security clearance level, access records, salary under certain conditions). Your detection pipeline needs to handle both simultaneously — and the handling rules can differ:
What an AI agent actually sees
An agent operating on a codebase, document set, or API response doesn't receive neatly labelled fields. It sees raw text, and sensitive data appears in unpredictable forms:
Customer: John Smith ([email protected])
SSN on file: 078-05-1120
Issue: Export shipment delayed — part no. EAR99-X44
Assigned analyst: [REDACTED per LEA-2024-0091]
Notes: Customer mentioned DOD contract #W911NF-24-C-0012 This single ticket contains: PII (name, email, SSN), potential CUI-Export (EAR99 part), CUI-Legal (law enforcement case reference), and CUI-Gov Contracts (DOD contract number). A naïve regex for SSN patterns would catch one field. A well-designed pipeline catches all five.
Detection strategy preview
The next post dives deep into detection. But here's the decision tree that shapes the whole series:
Design principle: classify first, then decide. Run a fast deterministic pass (regex + keyword anchors) to flag candidates, then a model-grounded pass to confirm context-dependent ones. Never redact based solely on pattern match alone — you'll break legitimate data like "SSN" as a column header in a schema file.
Key takeaways
- PII = any data that identifies a natural person. Governed by privacy law. Handled by minimization and pseudonymization.
- CUI = government-designated sensitive info that isn't classified. Governed by federal policy. Requires access control and audit trails.
- A single document can contain both — your pipeline needs both detection modes simultaneously.
- Quasi-identifiers matter as much as direct identifiers in agent outputs, not just inputs.
- CUI applies to contractors under NIST SP 800-171, not just federal agencies directly.