LLM Deployment in Regulated Industries: Data Residency and Privacy

Large language model deployment in regulated industries is not primarily a model selection problem. It is a data governance problem. HIPAA requires a Business Associate Agreement with every infrastructure provider that processes PHI, including inference endpoints. GDPR Article 46 restricts transfers of personal data outside the EEA to jurisdictions without an adequacy decision. Financial services firms under DORA and MAS TRM face data residency obligations that constrain which cloud regions and which model APIs are legally permissible. The architecture that satisfies these obligations is buildable â€” but only if residency requirements are treated as design inputs, not deployment afterthoughts.

Large language model deployment in regulated industries is not primarily a model selection problem. It is a data governance problem. Before a single inference call is made against regulated data, the organisation must have answered four questions: where does the data reside during inference, who processes it, under what legal basis, and what contractual protections govern that processing. The answers determine which model providers, which cloud regions, and which deployment architectures are legally available.

The Data Residency Constraint

Data residency requirements in regulated industries operate at two levels. The first is regulatory: HIPAA requires that PHI be processed under a Business Associate Agreement; GDPR Article 46 restricts transfers of personal data outside the EEA to jurisdictions without an adequacy decision or appropriate safeguards; MAS Technology Risk Management guidelines in Singapore require customer data to remain in-country unless specific approvals are obtained. The second is contractual: financial services firms frequently have client agreements that restrict where client data can be processed, independent of the regulatory obligation.

For LLM deployments, both levels apply simultaneously. Sending a prompt containing PHI to a US-based LLM API endpoint from a UK-based system triggers GDPR transfer obligations. Sending UK financial customer data to a US inference endpoint may trigger both GDPR and contractual restrictions. The engineering architecture must enforce these boundaries at the infrastructure level, not rely on prompt engineering to redact sensitive fields.

BAA Coverage and LLM API Providers

For healthcare organisations, the BAA question is the first gating requirement. Of the major LLM providers, Microsoft Azure OpenAI Service is available under the Microsoft HIPAA BAA. AWS Bedrock models are covered under the AWS BAA for HIPAA-eligible services. Google Cloud Vertex AI is covered under Google Cloud's HIPAA BAA. OpenAI's enterprise API tier includes BAA coverage. The consumer-tier APIs of any provider are not covered and cannot be used for PHI processing regardless of prompt construction.

BAA coverage is necessary but not sufficient. The BAA covers the API provider's obligations as a business associate. It does not address the data residency question, the logging configuration that determines whether prompt data is retained by the provider, or the subprocessor chain through which your data flows within the provider's infrastructure. Each of these requires separate verification against the provider's data processing documentation.

Deployment Architecture Patterns

Three primary deployment architectures exist for LLM in regulated environments, each with distinct compliance characteristics. Self-hosted open-weight models deployed in the organisation's own infrastructure provide the highest control: data never leaves the organisational boundary, there is no third-party subprocessor, and the BAA question is internal. The cost is infrastructure complexity and the operational burden of running GPU compute at scale. Models in this category include Meta Llama 3, Mistral, and Falcon.

Private deployment on dedicated cloud infrastructure provides a middle path. The model runs on cloud infrastructure but prompt data is not shared with other tenants or used for model training. The BAA and data processing agreement covers this deployment mode. The third pattern is inference gateways with PII redaction applied before the prompt reaches an external API, using tools like Microsoft Presidio or AWS Comprehend Medical to identify and strip regulated fields before sending to an external endpoint.

Logging, Retention, and the Audit Obligation

LLM inference logging in regulated environments requires capturing the inputs and outputs of every model call, the model version, the timestamp, and the user or system identity that initiated the request. This audit trail satisfies HIPAA access audit requirements, SR 11-7 model outcome logging requirements, and EU AI Act Article 12 automatic logging requirements for high-risk AI systems. The default logging configurations of most LLM providers are designed for debugging and usage tracking, not regulatory audit evidence. Configuring logging to produce compliance-grade evidence is an explicit engineering task.

Retention periods for LLM inference logs must align with the regulatory obligation of the data processed. HIPAA requires audit logs to be retained for six years. Financial services firms under SEC Rule 17a-4 must retain electronic communications records for three to seven years. The log storage architecture must be append-only and tamper-evident to satisfy these requirements.

GDPR Transfer Mechanisms for LLM Providers

For organisations subject to GDPR, LLM inference that involves personal data requires a valid transfer mechanism for any processing outside the EEA. The EU-US Data Privacy Framework provides an adequacy decision for US-based providers that are certified under the framework. Standard Contractual Clauses under the 2021 European Commission decision provide an alternative where framework certification is not in place. The transfer mechanism must be documented in the organisation's Records of Processing Activities under GDPR Article 30 and in the Data Protection Impact Assessment required for high-risk processing activities.

The Practical Starting Point

The practical starting point for any regulated LLM deployment is a data classification exercise that maps the data types the model will process to the regulatory obligations that attach to each type. That mapping determines which deployment architectures are available and which are not. Teams that begin with model selection and discover the residency constraints during legal review lose weeks. Teams that begin with data classification design the architecture correctly from the start.

Compliance Engineering

The engineering behind this article is available as a service.

We have done this work — not advised on it, not reviewed documentation about it. If the problem in this article is your problem, the first call is with a senior engineer who has solved it.

Talk to an Engineer See Case Studies →