All case studies
Northwind·B2B SaaS · Field-service·8 months · ongoing

Took over a stalled platform. 99.99% uptime, 22 features shipped, technical debt halved — in 6 months.

Northwind builds field-service-management software for HVAC, plumbing, and electrical contractors (1,400 active customers). Their lead engineer left abruptly in summer 2025, and the platform stalled — incidents up 4×, deploys down 60%, customer-success tickets up 2.7×. They needed an agency to take ownership of operations AND ship the product roadmap, not just fix bugs. We took over in 6 weeks, cleared the incident backlog in 90 days, shipped 22 customer-facing features in 6 months, and halved their technical debt while maintaining 99.99% uptime.

Started
Aug 2025
Region
United States, Canada, UK
Team
5 people
Stack
7 technologies
Software ManagementSoftware DevelopmentPerformance EngineeringTechnical Debt Recovery
99.99%
Uptime · 6 months
0 P1 incidents
22
Customer-facing features shipped
vs 8 promised
−54%
Technical debt
measured by SonarQube
11/wk
Deploy frequency
from 0.4/wk · +2,650%
The problem

Northwind came to us with…

When their lead engineer left, Northwind had a 14-page Notion handover doc, two junior engineers who'd never owned production, and 1,400 customers who didn't know any of this. Within 6 weeks: P1 incidents went from 0.3/month to 1.2/month, dependency upgrades stopped, and the customer-success team was fielding 4× their usual ticket volume. The CTO needed to either rebuild the engineering team (12-month process) or hand off operations to a senior agency.

The four core challenges
CHALLENGE 01

Inherited a stalled platform

47 known bugs in the backlog (some open for 14 months). 8 dependency-vulnerability alerts. Test coverage at 22%. CI/CD broken. Deployment wiki was 18 months stale.

CHALLENGE 02

Two junior engineers, no senior

Both juniors were strong but had never owned production incidents. We needed to handle on-call AND mentor them up to mid-level competence.

CHALLENGE 03

1,400 customers, zero notice

Couldn't take downtime windows. Migration had to happen in-flight without customer-visible disruption.

CHALLENGE 04

Roadmap commitments

8 features had been promised on dated commit dates. Pushing them would erode customer trust further. We had to ship AND clean up the platform simultaneously.

How we shipped it

The approach

Weeks 1–6
01

Discovery + emergency stabilization

Two staff engineers + one SRE shadowed every part of the platform for 2 weeks. Wrote 47 pages of runbooks. Set up OpenTelemetry + PagerDuty. Cleared the P1 backlog (4 critical bugs, all production-blocking) in week 5. Took over on-call in week 6.

Deliverables
  • 47-page runbook library
  • OpenTelemetry instrumentation across 12 services
  • PagerDuty rotation + escalation
  • P1 backlog cleared (4 bugs)
  • Dependency upgrade plan (8 vulns)
Months 2–3
02

Test coverage + CI/CD recovery

Got CI/CD back to green. Wrote tests for the 12 highest-risk modules. Test coverage went 22% → 64%. Deploy frequency went from 0.4/week to 11/week. Mean time to deploy a 1-line change went from 3 days to 22 minutes.

Deliverables
  • Test coverage 22% → 64%
  • CI/CD pipeline (GitHub Actions)
  • Canary deployment infrastructure
  • Feature flags (LaunchDarkly)
  • Automated dependency upgrades (Renovate)
Months 3–6
03

Roadmap + customer-facing features

Shipped 22 customer-facing features against the original roadmap commitments. Plus 6 unplanned features driven by data from the new observability stack (we found bugs that revealed unmet customer needs). Customer-success ticket volume halved.

Deliverables
  • 22 customer-facing features shipped
  • 6 unplanned features (from observability data)
  • Mobile app v3 (React Native)
  • API platform v2 (rate-limited, versioned)
  • Performance audit + optimization
Months 6–8 · ongoing
04

Steady-state operations + roadmap velocity

Now in steady state: 11 deploys/week, 99.99% uptime, customer-success at 60% of pre-stall volume. Both junior engineers have been promoted to mid-level and own discrete services. We're 8 weeks into the year-2 roadmap with zero incidents shipped.

Deliverables
  • Steady-state operations playbook
  • Junior engineer mentorship track
  • Quarterly SLO + error-budget review
  • Customer-facing status page
  • Year-2 roadmap (scoped + estimated)
The receipts

Before / after — every metric

Numbers verifiable with the client. Audit trail available on request.

MetricBeforeAfterChange
P1 incidents (monthly)1.20−100%
Mean time-to-resolve P18.4 hours1.1 hours−87%
Test coverage22%64%+190%
Deploy frequency (per week)0.411+2,650%
Mean time-to-deploy 1-line change3 days22 min−99.5%
Open dependency vulnerabilities80−100%
Customer-success ticket volume240/wk118/wk−51%
Features shipped (6 months)22
Technical-debt score (SonarQube)47.221.8−54%
Uptime (6 months)99.2%99.99%+0.79pp
What we ran it on

Stack, team, and tools

Tech stack
  • · Node.js + TypeScript
  • · Postgres
  • · Redis
  • · React Native (mobile)
  • · Next.js (web)
  • · AWS (ECS + RDS)
  • · OpenTelemetry
Team
  • · 1 engineering manager (lead)
  • · 2 staff engineers
  • · 1 SRE / on-call lead
  • · 1 mobile engineer (RN)
  • · 1 QA engineer
Tools
  • · GitHub Actions
  • · PagerDuty
  • · Datadog
  • · SonarQube
  • · LaunchDarkly
  • · Renovate
  • · Sentry

When our lead engineer left, I had three options: rebuild the team (12 months), accept slower delivery (board wouldn't), or find a senior agency to operate the platform. SERP Axis was option three. Six months later we have 99.99% uptime, 22 features shipped, and our two junior engineers have been promoted to mid-level. They didn't just operate it — they made our team better.

HB
Helena Brodie
CTO, Northwind

I went from 4× ticket volume back to under our pre-stall baseline. The customer-success team noticed the difference within a month. The retention math alone paid for the engagement.

TA
Tom Aldrich
VP Customer Success, Northwind
What's next

Year-2 plan: AI-assisted scheduling for field technicians (RAG over historical work-order data), plus a Power BI dashboard for ops + customer-success. Both scoped, kicking off month 9.

4 strategy seats remaining · Q3

The cost of waiting
is your competitor.

Every 90 days you delay is 90 days of authority compounding for someone else. Get the audit. See the math. Then decide.

Money-back
60 days
Reply within
3 hours
Audit value
$2,400 yours, free