All insights
Engineering· Mar 28, 2026· 12 min read

Shipping vs. shipping fast: a senior engineer's three-variable heuristic

Velocity is one of three variables that matter. The other two are reversibility and observability. Here's how senior engineers trade them.

AM
Aman Mathur
Founder, SERP Axis

1. The three variables — velocity is just one

The trap: optimizing only for velocity. You ship 10× faster, but every bad deploy costs you 8 hours of production firefighting because you can't roll back cleanly and you don't know what's wrong. Net velocity is lower than before.

The fix: invest in reversibility and observability first, then velocity follows naturally. Once a deploy is one-click reversible and you can detect a regression in 2 minutes, you can ship 20 times a day without fear.

  • Velocity: how fast can a change get from idea to production. Measured in lead time for changes (DORA metric).
  • Reversibility: how cheap is it to undo a change if it goes wrong. Measured in mean time to recovery (DORA metric).
  • Observability: how quickly do we know a change went wrong. Measured in mean time to detection (MTTD).

2. Reversibility — the cheap insurance you don't take out

Reversibility is binary: either you can undo a change in under 5 minutes, or you can't. The investment to get to 'yes' is engineering work most teams skip until after their first major incident.

  • Feature flags. Every meaningful behavior change ships behind a flag. Flag toggle = 30 seconds to revert. No redeploy needed. LaunchDarkly, Statsig, Vercel Edge Config, or build your own with Postgres.
  • Database migrations: backwards-compatible by default. Add columns nullable, populate, then deploy code that reads them. Don't drop columns until two deploys later.
  • Canary deploys. Roll out to 1% of traffic first. Promote to 10%, 50%, 100% based on error rates. If something's wrong, only 1% saw it.
  • Blue-green deploys. New version runs alongside old; traffic switches atomically. Rollback is the same switch in reverse.
  • Atomic infra changes. Use Pulumi/Terraform with proper state. A failed change reverts to the previous state instead of leaving a half-broken environment.
Feature flag pattern: ship behind a flag, validate, then promote
typescript
import { Flags } from "@/lib/flags";

export async function getProductPrice(productId: string, userId: string) {
  const useDynamicPricing = await Flags.isEnabled(
    "dynamic-pricing-v2",
    { userId, default: false }
  );

  if (useDynamicPricing) {
    return await computeDynamicPrice(productId, userId);
  }
  return await getStaticPrice(productId);
}

// Day 1: ship with flag off (default false). 0% impact.
// Day 2: enable for internal users. Validate.
// Day 3: 5% rollout. Watch metrics.
// Day 5: 50% rollout.
// Day 7: 100% rollout. Remove flag in next sprint.

// If anything breaks at any percentage:
//   Flag toggle off = instant rollback. No redeploy needed.

3. Observability — knowing when you're wrong (in minutes, not days)

Observability is the difference between 'a customer emailed us about an outage' and 'PagerDuty paged us 90 seconds after deploy'. The infrastructure investment pays for itself the first time it catches a bug before a customer does.

  • Three pillars: logs (what happened), metrics (how often, how much), traces (request flow across services). All three, instrumented end-to-end.
  • OpenTelemetry as the default. Vendor-agnostic, supports all three pillars, ships with most modern frameworks. Datadog, Honeycomb, Grafana Cloud as the storage/UI.
  • Per-deploy comparison. Vercel Speed Insights, Sentry deploy markers — every deploy is annotated in your dashboards. Spike in error rate? You can see which deploy did it.
  • Real-user monitoring (RUM) for frontend. CrUX is sampled and slow; RUM is real-time and per-user. Vercel Speed Insights or DataDog RUM.
  • SLOs + error budgets. Define what 'available' means (e.g., 99.9% of requests under 500ms). Alert when error budget burns faster than expected.
MTTD before MTTR

You can't recover (MTTR) what you don't detect (MTTD). Most teams optimize MTTR with on-call training and runbooks before they've fixed MTTD. Detection is the harder problem; fix it first.

4. DORA metrics: the calibration tool

The two metrics that correlate most strongly with high-functioning teams: deploy frequency and mean time to recovery. The fast teams aren't fast because they're cowboys — they're fast because reversibility + observability let them deploy without fear.

MetricEliteHighMediumLow
Lead time for changes< 1 hour1 day – 1 week1 week – 1 month1 month +
Deploy frequencyOn demand (multi/day)Once/day–once/weekOnce/week–once/monthLess than once/month
Mean time to recovery< 1 hour< 1 day1 day – 1 week1 week +
Change failure rate0–15%16–30%16–30%16–30%

5. Feature flags as a velocity multiplier

Cost: ~$200–$2000/month for LaunchDarkly or Statsig at typical scale. ROI: deploy frequency goes from weekly to multi-daily. Lead time drops 5–10×. Change failure rate halves because the rollback strategy is one click.

  • Deploy code at any time. Code is dormant behind 'off' flags until the launch.
  • Marketing announces; you flip a flag. No 2am deploy windows.
  • Bad launch? Flip the flag back. No emergency rollback.
  • Per-customer flags. Beta features for specific accounts. Gradual rollouts to enterprise tiers first.
  • A/B testing emerges from the same infrastructure. Half-and-half assignment + metrics.

6. When to slow down deliberately

Senior engineers know when fast is wrong. Three signals to slow down:

  • The change is one-way. Database deletions, third-party API contracts, customer-visible URL changes. Once shipped, undoing is expensive. Slow is the right speed.
  • The change is high-stakes. Payment processing, authentication, anything regulatory. Reversibility doesn't help if the bug already cost you a SOC 2 finding.
  • The change is poorly understood. If two engineers can't agree on what the change does, fast just means breaking faster. Spend the time to align first.
  • The team is tired. Velocity is a function of energy. If the team has been on-call for two weeks, going fast is going to break things even with all the right infrastructure.

7. The decision table

The table is intentionally tactical. Most engineering decisions don't need a meeting; they need a heuristic that says 'flag this, don't flag that, observe this hard, observe that lightly.' The best engineering teams have internalized this table to the point that it's reflex.

Change typeVelocity priorityReversibility needObservability need
UI tweakHigh — ship itLow (CSS revert)Low
New feature, low-riskHigh — feature flag itHigh (flag-gated)Medium
DB schema changeLow — go slowCritical (backwards-compat)High
Auth / paymentLow — go slowCritical (canary 1%)Critical (instant alerting)
Marketing copyHigh — ship itLow (revert via CMS)Low
3rd-party integrationMediumHigh (circuit breaker)High (per-vendor SLO)
Infra configLow — go slowCritical (state-tracked)Critical
Performance optimizationMediumHigh (flag-gated)Critical (regression detect)
Tags
DORAEngineeringVelocityDevOpsSRE
4 strategy seats remaining · Q3

The cost of waiting
is your competitor.

Every 90 days you delay is 90 days of authority compounding for someone else. Get the audit. See the math. Then decide.

Money-back
60 days
Reply within
3 hours
Audit value
$2,400 yours, free