Retrieval-augmented generation is the dominant architecture for compliance and regulatory question-answering systems. The alternative -- relying on an LLM's parametric knowledge to answer compliance questions -- has too many failure modes for regulated use: the training data has a cutoff date, the model may have been trained on jurisdictionally incorrect or superseded versions of regulations, and hallucination on compliance questions carries specific legal risk. RAG grounds the model's responses in a retrieval corpus that is under the organisation's control and can be kept current, versioned, and audited.
The Retrieval Architecture
A production RAG system for compliance document retrieval has several components: a document corpus, an ingestion pipeline that processes new documents and updates, a chunking strategy that breaks documents into retrievable units, an embedding model that produces vector representations of chunks, a vector store for approximate nearest neighbour retrieval, a re-ranking layer that refines the retrieved set, and a prompt template that incorporates retrieved context into the LLM query. Each of these components has engineering decisions that affect retrieval quality and compliance reliability.
Chunking strategy is a commonly underestimated design decision. Splitting a regulatory document at fixed character intervals will frequently break the semantic unit that makes a regulation clause interpretable -- a requirement split from its exception, a definition separated from its reference. Semantic chunking that preserves clause boundaries and hierarchical chunking that maintains section context both perform better for compliance text than naive fixed-size splitting.
Embedding Model Selection for Legal and Compliance Text
General-purpose embedding models trained on web-scale text may not produce optimal representations for regulatory and legal language, which has specific terminology, structure, and semantic patterns that differ from general prose. Specialised embedding models fine-tuned on legal text -- including models in the LEGAL-BERT family -- can produce higher-quality retrieval for compliance corpora. The evaluation criterion for embedding model selection is retrieval recall on a domain-specific benchmark: for a given compliance question, does the embedding model retrieve the relevant regulatory clause in the top-k results.
Corpus Governance in Regulated Environments
The compliance value of a RAG system depends entirely on the quality and currency of the retrieval corpus. A compliance QA system that retrieves outdated regulatory text or superseded guidance will produce answers that are confidently wrong in ways that create legal risk. Corpus governance requires explicit processes for monitoring regulatory sources for updates, ingesting new documents on a defined schedule, version tagging documents with effective dates, and retiring superseded versions from the active retrieval index.
Access control in multi-tenant compliance systems requires that the retrieval layer enforces tenancy boundaries. A system serving multiple clients cannot allow one client's confidential documents to be retrievable in another client's queries. Vector stores must support namespace or metadata filtering that is enforced at query time, not applied as a post-retrieval filter. Post-retrieval filtering can still expose the existence of documents that should not be visible, which is a confidentiality violation even if the content is not returned.
Evaluating Retrieval Quality for Compliance Use Cases
Retrieval quality evaluation for a compliance RAG system requires a domain-specific benchmark dataset: a set of compliance questions with known correct source documents and acceptable answer ranges. Standard RAG evaluation frameworks like RAGAS measure answer faithfulness, answer relevance, and context precision and recall. For compliance applications, the most critical metric is context recall -- whether the relevant regulatory clause was retrieved. A system with high answer fluency but low context recall will produce well-formed answers that are not grounded in the correct regulatory text. This is the most dangerous failure mode for a compliance application.
Hallucination Controls in Compliance RAG
RAG reduces hallucination but does not eliminate it. The LLM can misinterpret retrieved context, draw incorrect inferences across multiple retrieved clauses, or generate hallucinated text in portions of its response that go beyond what the retrieved context supports. Compliance RAG systems should include citation requirements in the prompt template -- the model must cite the specific document and section from which it is drawing each claim -- and output validation that checks whether the cited sources actually support the claim. Systems that include a structured output layer separating retrieved citations from model-generated synthesis make this validation tractable.
EU AI Act: What CTOs Actually Need to Do Before August 2026
The Vendor Rescue Pattern: How to Recover a Failed Implementation in 12 Weeks
The LLM Hallucination Problem in Regulated Environments: What 'Acceptable Error Rate' Actually Means
The engineering behind this article is available as a service.
We have done this work — not advised on it, not reviewed documentation about it. If the problem in this article is your problem, the first call is with a senior engineer who has solved it.