Skip to content
The Algorithm
InsightsAI in Regulated Industries
AI in Regulated IndustriesHealthcare10 min read · 2026-04-03

RAG Architecture for Regulated Industries: When Your Knowledge Base Is PHI

BAA
Required for every vector store that indexes PHI — most RAG implementations don't have one
RAG architectures in healthcare introduce compliance dimensions that pure generation-only LLM deployments do not have. When the document corpus contains PHI, the vector store becomes a PHI data store requiring a Business Associate Agreement. The chunking strategy determines whether PHI from different patients can co-mingle in the same embedding. The retrieval layer requires patient-scoped access control. And every document retrieved and passed to the model context window constitutes a use of PHI that HIPAA's minimum necessary standard applies to.

Retrieval-Augmented Generation is the architecture that makes large language models useful for enterprise knowledge work: instead of relying solely on training data, the model retrieves relevant documents from a corpus and incorporates them into the generation context. In a healthcare deployment where the document corpus contains Protected Health Information, the compliance surface area is substantial — and most current RAG implementations do not address it.

The Vector Store as a PHI Data Store

When a RAG system ingests clinical notes, discharge summaries, lab reports, or any other PHI-containing documents, the vector store that holds the embeddings becomes a PHI data store. The embeddings themselves — the numerical vectors that represent document chunks — have been demonstrated to be invertible under certain conditions, meaning it is possible to reconstruct approximations of the original text from the embedding. If the data store can be used to identify or reconstruct PHI, it must be protected as PHI.

This means the vector database provider must sign a Business Associate Agreement before any PHI-containing documents are ingested. Pinecone, Weaviate, Qdrant, and the other major vector database vendors each have different BAA postures — some offer them readily, others do not, and some offer them only at enterprise licensing tiers. Evaluating vector database vendors for healthcare RAG deployments must include BAA availability as a hard filter before evaluating technical capabilities.

The Engineering Reality

The chunking strategy in a PHI-containing RAG system has compliance implications that general RAG best practices do not address. A 512-token chunk may span a document boundary and include PHI from two different patients in the same chunk. When that chunk is retrieved and included in the model context, it constitutes a disclosure of PHI for a patient whose record was not relevant to the query. Patient-scoped chunking — ensuring that document chunks never contain PHI from more than one patient — is the only approach that satisfies HIPAA's minimum necessary standard at the retrieval layer.

Access Control at the Retrieval Layer

Standard RAG architectures perform a vector similarity search against the entire corpus and return the most semantically relevant chunks regardless of who is asking. In a healthcare context, this means a clinician querying about one patient could retrieve chunks from other patients' records if those records are semantically similar to the query. The retrieval layer must enforce patient-scoped access control: the similarity search must be filtered by a patient identifier tied to the authenticated user's authorised patient list.

Most vector databases support metadata filtering — you can store a patient_id field alongside each chunk and filter search results to a specific patient_id value. The challenge is ensuring that the patient_id filter is always applied and cannot be bypassed. This requires the RAG orchestration layer to inject the filter as a mandatory query parameter derived from the authenticated user's session, not from user-supplied input.

Audit Logging of Retrieval

HIPAA's audit control requirement (§164.312(b)) requires systems to record and examine activity in information systems that contain PHI. In a RAG system, activity includes retrieval — every document chunk retrieved and passed to the model context constitutes a use of PHI. The audit log must record: the identity of the authenticated user, the query that triggered retrieval, the specific chunks retrieved (identified by document and chunk identifier), the patient identifier scoped to the retrieval, and the timestamp. LangChain, LlamaIndex, and similar frameworks log model inputs and outputs but do not log retrieval provenance at the chunk level. Custom instrumentation of the retrieval step is required.

BAA Chain for the Full RAG Stack

A production RAG system typically involves multiple vendors: a cloud provider, a vector database vendor, an embedding model provider, and an LLM provider. Each of these vendors that handles PHI must sign a BAA. AWS, Azure, and GCP offer HIPAA BAAs. OpenAI offers a BAA for its API under the Enterprise tier. Cohere and other embedding providers have varying BAA postures. If the embedding model provider does not sign a BAA, PHI cannot be passed to the embedding API — which means the entire ingestion pipeline must operate without sending PHI to that endpoint. In practice, this drives healthcare RAG deployments toward self-hosted embedding models or cloud provider-native embedding APIs covered by the existing cloud provider BAA.

  1. Evaluate vector database vendors on BAA availability before evaluating technical capabilities
  2. Implement patient-scoped chunking: no chunk should contain PHI from more than one patient
  3. Enforce patient-scoped retrieval filtering at the orchestration layer — not as an optional parameter
  4. Build retrieval audit logging at the chunk level: document ID, chunk ID, patient scope, user identity, timestamp
  5. Map every vendor in the RAG stack against HIPAA BAA requirements before ingesting PHI
  6. Consider self-hosted embedding models to avoid PHI transmission to third-party embedding APIs that lack BAAs
Related Articles
Architecture

What Happens to Your HIPAA BAAs When You Migrate to Cloud

Read →
AI in Regulated Industries

Agentic AI in Healthcare: The HIPAA Problems Nobody Is Talking About

Read →
Compliance Engineering

Why NHS DSPT Failures Are an Engineering Problem, Not a Policy Problem

Read →
Facing This?

The engineering behind this article is available as a service.

We have done this work — not advised on it, not reviewed documentation about it. If the problem in this article is your problem, the first call is with a senior engineer who has solved it.

Talk to an EngineerSee Case Studies →
Engage Us