Skip to content
The Algorithm
InsightsAI in Regulated Industries
AI in Regulated IndustriesFinancial Services11 min read · 2026-03-28

SR 11-7 and AI Governance: What the Fed Expects From Your Model Risk Management

SR 11-7
Federal Reserve model risk management guidance — written in 2011, still the primary examination framework for AI in banking
The Federal Reserve's SR 11-7 guidance on model risk management remains the primary regulatory framework for AI and ML models in US banking — but it was written before transformers, before LLMs, and before foundation models. The model inventory, validation, conceptual soundness, and ongoing monitoring requirements of SR 11-7 all apply to LLMs deployed in financial services — and the guidance's silence on stochastic models creates interpretive gaps that examiners are currently filling with examination findings.

Federal Reserve Supervisory Letter SR 11-7 ("Guidance on Model Risk Management"), issued April 4, 2011, defines the Federal Reserve's expectations for model risk management at banking organizations. The OCC issued a parallel document (OCC 2011-12) the same day. SR 11-7 defines a "model" as a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates. Under this definition, every machine learning model deployed in financial services is a model under SR 11-7 — including large language models used for customer service, credit underwriting assistance, or regulatory compliance functions.

What SR 11-7 Actually Requires

SR 11-7's model risk management framework has three core requirements: model development, implementation, and use must be subject to rigorous model validation; there must be a model inventory that captures all models in use; and there must be ongoing monitoring of model performance. For traditional statistical models (logistic regression, linear regression, decision trees), these requirements are well-understood and documented. For machine learning models — particularly foundation models and LLMs — the SR 11-7 requirements create interpretive challenges that examiners are currently resolving through examination findings rather than updated guidance.

The model validation requirement under SR 11-7 has three components: evaluation of conceptual soundness, ongoing monitoring, and outcomes analysis. Conceptual soundness requires documentation of the model's theoretical underpinning, the assumptions that drive its behaviour, and the limitations of those assumptions. For a logistic regression credit model, this is straightforward: the model assumes a linear relationship between input variables and log-odds of default. For an LLM used in credit underwriting assistance, "conceptual soundness" documentation must address transformer architecture assumptions, training data biases, and the relationship between token prediction and the financial decision the model is supporting.

The Engineering Reality

The single most common SR 11-7 examination finding for ML models in 2024-2025 is inadequate model inventory. SR 11-7 expects that the model inventory captures all models — and examiners are finding LLMs deployed in customer service, document processing, and compliance functions that were not included in model inventories because they were classified as "tools" rather than "models." If an LLM is making or materially assisting in a financial decision, it is a model under SR 11-7.

The Stochastic Model Problem

SR 11-7's validation requirement assumes deterministic models: given the same inputs, the model produces the same outputs. This assumption underlies the replication testing approach in SR 11-7 — an independent validator can rerun the model with the same inputs and verify the outputs match. LLMs are stochastic: the same prompt produces different outputs on different runs (at temperature > 0). This breaks the replication testing approach. The SR 11-7-compliant validation methodology for stochastic models requires: distributional testing (does the output distribution match expected characteristics across many runs?), adversarial testing (does the model behave predictably under adversarial prompts?), and bounded output validation (do all outputs fall within acceptable financial decision ranges?).

Ongoing Monitoring Requirements

  1. Model performance tracking: for each LLM deployed in a financial decision function, define and track performance metrics that map to the financial outcome — not just accuracy on a test set, but downstream decision quality metrics
  2. Data drift monitoring: LLMs are sensitive to input distribution changes; monitor the distribution of inputs to production LLMs and trigger revalidation when the input distribution shifts materially from the training/validation distribution
  3. Output distribution monitoring: track the distribution of LLM outputs (decision classes, confidence scores, output lengths) and alert on anomalous shifts
  4. Fairness and disparate impact monitoring: SR 11-7 does not use the term "fairness," but the Equal Credit Opportunity Act and Fair Housing Act require that models used in credit and housing decisions not produce disparate impact — monitor for disparate impact continuously, not just at model deployment
  5. Revalidation triggers: define specific conditions that trigger model revalidation — input distribution shift beyond a threshold, output distribution shift, adverse outcomes rate change, or material update to the foundation model
Related Articles
Compliance Engineering

DORA Is Live. Here's What 'Operational Resilience' Means for Your Codebase

Read →
AI in Regulated Industries

Agentic AI in Healthcare: The HIPAA Problems Nobody Is Talking About

Read →
AI in Regulated Industries

The LLM Hallucination Problem in Regulated Environments: What 'Acceptable Error Rate' Actually Means

Read →
Facing This?

The engineering behind this article is available as a service.

We have done this work — not advised on it, not reviewed documentation about it. If the problem in this article is your problem, the first call is with a senior engineer who has solved it.

Talk to an EngineerSee Case Studies →
Engage Us