Failed software implementations have signatures. By the end of the first week of a vendor recovery engagement, experienced engineers can identify whether the codebase is salvageable or whether a rebuild is the only path to production. The distinction matters because the answer determines the recovery architecture — and a misdiagnosis in the first two weeks is the most common reason a 12-week recovery takes 18 months.
These are the eight failure patterns we've seen across vendor recovery engagements, and the triage framework for each.
The Eight Failure Patterns
Pattern 1: The Scope Creep Collapse. The original scope was deliverable. The delivered scope is not. The codebase has 3-5x the original feature surface, none of it complete, and the core functionality is buried under half-built extensions. Diagnosis: map what was in the original SOW. Everything outside it is a write-off unless it's load-bearing for the core. Prognosis: salvageable if the core is intact.
Pattern 2: The Integration Fantasy. The vendor built the application assuming integrations would work in ways they don't. The data model assumes a format the upstream system doesn't produce. The API calls assume response structures that are incorrect. Diagnosis: test every integration contract in the first week. Prognosis: usually salvageable if the application logic is sound, but integration rebuild can be 40-60% of recovery effort.
Pattern 3: The Compliance Afterthought. The system works, but compliance was "for the next sprint" for 18 sprints. Audit logs don't exist. Access controls are role-based in the UI but not in the database. Encryption is present in production but not enforced in staging. Diagnosis: run the compliance checklist for the applicable framework in the first two days. Prognosis: salvageable, but compliance retrofit adds 4-6 weeks to any recovery timeline.
Pattern 4: The Performance Cliff. The system works with synthetic data in demo environments. It fails under realistic data volumes. Diagnosis: load test with production-scale data immediately. Prognosis: depends on whether the performance problem is architectural (data model, N+1 queries, synchronous processing where async is required) or implementation (missing indexes, unoptimised queries). Architectural performance problems often require partial rebuild.
Pattern 5: The Documentation Desert. The system runs. No one knows why. No architecture decisions are documented. No infrastructure is in code — it was clicked through the console. Diagnosis: reverse-engineer the architecture before touching anything. Prognosis: salvageable but high-risk. Any change has unknown blast radius.
Pattern 6: The Dependency Timebomb. The system uses pinned dependencies with known CVEs. Or it uses package versions that are no longer maintained. Or it runs on an EOL runtime version. Diagnosis: automated dependency scan in the first hour. Prognosis: usually fixable, but in regulated environments (PCI DSS, HIPAA, FedRAMP) the dependency update itself must go through a change management process.
Pattern 7: The Test Void. The system has test files. The tests don't test anything meaningful — snapshot tests of UI state, tests that mock every dependency, tests that always pass. Coverage metrics are meaningless. Diagnosis: run the test suite and examine what it actually asserts. Prognosis: the test void is a symptom, not the problem. The problem it reveals is that the codebase was never validated by anyone who understood the domain.
Pattern 8: The Vendor Abandonment. The previous vendor has disappeared, become unresponsive, or is holding code or credentials hostage. Diagnosis: establish what you actually control — code, infrastructure credentials, domain registrations, SSL certificates, deployment pipelines. Prognosis: depends on what you own. If you own the code and credentials, recovery is straightforward. If you don't, recovery requires rebuilding from a dump of production data.
The salvageability decision is not primarily technical — it's economic. A codebase that would take 14 weeks to bring to production through recovery might take 10 weeks to rebuild. The rebuild gives you a clean architecture, proper compliance foundations, and maintainable code. The recovery gives you faster deployment of a system that still has the original technical debt. The right answer depends on the organisation's risk appetite and the cost of the additional 4 weeks.
The 12-Week Recovery Architecture
- Week 1-2: Triage and environment stabilisation. Own the credentials. Freeze deployments from the previous vendor. Document what exists.
- Week 2-3: Compliance and security gap analysis. Run the checklist. Identify what must be fixed before production can go live.
- Week 3-6: Core feature stabilisation. Fix the integration contracts. Fix the performance issues. Do not add features.
- Week 6-8: Compliance remediation. Implement the controls that were missing. This is non-negotiable in regulated environments.
- Week 8-10: Testing and validation. Build the test coverage that was missing. Not to hit a coverage number — to validate that the system does what it must do.
- Week 10-11: Staged deployment. Production with limited user population. Monitor aggressively.
- Week 11-12: Full production and handover. Documentation, runbooks, incident response procedures.
EU AI Act: What CTOs Actually Need to Do Before August 2026
The LLM Hallucination Problem in Regulated Environments: What 'Acceptable Error Rate' Actually Means
How Accenture's Staff Augmentation Model Creates Compliance Debt (And How to Audit It)
The engineering behind this article is available as a service.
We have done this work — not advised on it, not reviewed documentation about it. If the problem in this article is your problem, the first call is with a senior engineer who has solved it.