HIPAA De-Identification Standards
The two HIPAA methods for removing PHI from health data — and why Expert Determination is more defensible than Safe Harbor for most analytics use cases.
HIPAA's Privacy Rule at 45 CFR § 164.514(b) provides two methods for de-identifying protected health information, after which the data is no longer PHI and the Privacy Rule no longer applies. The Safe Harbor method requires removing 18 specific identifiers (name, geographic data smaller than state level, dates more specific than year for individuals over 89, phone numbers, fax numbers, email addresses, SSNs, MRNs, health plan beneficiary numbers, account numbers, certificate/license numbers, VINs, device identifiers, URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number, characteristic, or code) and having no actual knowledge that the remaining information could identify an individual. The Expert Determination method requires applying generally accepted statistical and scientific principles to demonstrate that the risk of identifying any individual is "very small," with a qualified statistician providing written documentation of the analysis and results.
The engineering reality of de-identification is that Safe Harbor's 18-identifier removal is a floor, not a ceiling of privacy protection, and is frequently inadequate for research or analytics use cases involving rare conditions, small geographic areas, or temporal data. The "very small" risk standard under Expert Determination is not defined numerically by HIPAA, but HHS guidance references a risk threshold of approximately 0.04 (1 in 25) as the boundary used in some expert analyses. Re-identification risk assessment techniques include population uniqueness analysis (k-anonymity, l-diversity, t-closeness), record linkage attacks using auxiliary datasets, and quasi-identifier combination analysis. The critical engineering failure mode is treating de-identification as a one-time data transformation rather than a contextual risk assessment: the same dataset may be de-identified relative to one auxiliary dataset and re-identifiable relative to another (e.g., public voter records, insurance enrollment files). Data use agreements and limited dataset frameworks (which retain some dates and geographic data under stricter access controls) are alternatives to full de-identification for certain research purposes.
De-identification interacts with several other regulatory frameworks in ways that create compliance complexity. The HITECH Act's breach safe harbor applies to "encrypted" PHI, not to de-identified data — the concepts are legally distinct. Under GDPR, de-identified health data may still be pseudonymous (and therefore subject to GDPR) if re-identification is reasonably possible by any party with access to auxiliary data, creating a stricter standard than HIPAA for European data. The CCPA/CPRA "deidentified" definition has its own requirements (technical safeguards, public commitment not to re-identify, contractual obligations on recipients) that differ from HIPAA. For AI/ML applications using health data, the de-identification question is particularly acute: model training data may be de-identified, but trained models can sometimes reproduce or reveal training data characteristics — the FDA's AI/ML guidance and emerging state laws are beginning to address this model-level privacy risk.
We conduct Expert Determination de-identification analyses using statistical methods including population uniqueness modeling, quasi-identifier risk scoring, and simulated re-identification attack testing against available auxiliary datasets. Our de-identification pipelines implement configurable suppression, generalization, and noise injection transformations with reproducible, auditable parameter sets documented for the expert's written certification. We maintain de-identification risk assessments as living documents, reassessing when data scope, geographic granularity, or auxiliary dataset availability changes.
Compliance-Native Architecture Guide
Design principles and a structured checklist for building software that is compliant by default — not compliant by retrofit. Covers data architecture, access controls, audit trails, and vendor due diligence.