The two lines that break everything
Almost every service that emits events contains some version of this: persist a change, then tell the outside world about it. It looks innocent and it passes every test you write, because your tests never crash the process in the 8-millisecond window between the two calls.
But those two calls hit two different systems with two independent failure domains and no shared transaction. There is no `COMMIT` that spans both. The moment you write this, you have hand-rolled a distributed transaction — one with no coordinator and no rollback.
async function placeOrder(order: Order) {
await db.orders.insert(order); // (1) commits to Postgres
await broker.publish("order.placed", order); // (2) sends to Kafka
}Enumerate the partial failures
Correctness in distributed systems is a case analysis, not a vibe. List every place the process can die and ask what state the world is left in.
- Crash after (1), before (2): the order exists in the database but no event was ever published. Downstream (billing, search index, email) never hears about it. Silent, permanent divergence — the worst kind, because nothing errors.
- (2) succeeds, then the DB transaction is rolled back or the response is lost and the caller retries: you have published an event for an order that does not exist, or you publish it twice.
- (1) commits, (2) throws, you 'handle' it by retrying (2): now ordering and delivery guarantees depend on your retry loop surviving the same crash you're trying to defend against.
- Reverse the order — publish first, then write — and you get the symmetric bug: consumers act on an order the database never stored.
Whichever operation you put second can fail independently of the first. Two non-transactional writes to two systems cannot be made atomic by reordering them. The problem is structural, not a sequencing detail.
Why two-phase commit is not the answer
The textbook reflex is a distributed transaction: two-phase commit (2PC / XA) across the database and the broker. In practice this trades a data-integrity bug for an availability bug.
2PC introduces a transaction coordinator that must be up for any write to commit, and participants hold locks through the prepare phase while they await the coordinator's decision. If the coordinator dies after participants vote 'yes' but before it broadcasts 'commit', those participants are blocked holding locks until it recovers — 2PC is a blocking protocol. Add that most modern brokers (Kafka included) are not good XA participants, and you've coupled the availability of every write to the weakest node in the set.
The lesson isn't 'never use transactions across systems' — it's that you already have one perfectly good transactional system in the loop: your database. The fix is to make the event part of the same local ACID transaction as the data.
"Exactly-once delivery" is impossible — aim for effectively-once
Before fixing it, kill the fantasy. Exactly-once delivery over an unreliable network is provably impossible: the sender can never distinguish a lost message from a lost acknowledgement, so it must either risk never delivering (at-most-once) or risk delivering again (at-least-once). There is no third option at the delivery layer.
What you can build is effectively-once processing: at-least-once delivery plus consumers that are idempotent, so a duplicate delivery produces no additional effect. Kafka's much-cited 'exactly-once semantics' is exactly this within Kafka's own boundary — an idempotent producer plus transactions over Kafka topics and offsets. It does not extend exactly-once to the arbitrary external system your consumer writes to. That last mile is always your idempotency to enforce.
at-least-once delivery + idempotent consumer = effectively-once. Every reliable event pipeline is built on this identity. If either half is missing, you have a bug you haven't hit yet.
The transactional outbox
Here is the actual fix. Instead of writing to the broker inside your business logic, write the event into an `outbox` table in the same database, inside the same transaction as the business change. Because it's one local ACID transaction, the row and the event commit together or not at all — the dual write is gone.
A separate relay process then reads unsent rows from the outbox and publishes them to the broker, marking each as sent. If the relay crashes mid-publish, it simply re-publishes on restart — which is at-least-once, exactly what your idempotent consumers already expect.
- Ordering: publish partitioned by aggregate_id so all events for one entity land on one partition and stay ordered. Global ordering across entities is neither achievable cheaply nor usually needed.
- Throughput: a naive `WHERE sent_at IS NULL` poll is fine to thousands/sec; index it and use `FOR UPDATE SKIP LOCKED` to run multiple relay workers without them fighting over the same rows.
- Cleanup: partition or periodically prune sent rows so the outbox doesn't grow unbounded.
await db.transaction(async (tx) => {
await tx.orders.insert(order);
await tx.outbox.insert({
id: crypto.randomUUID(), // becomes the idempotency key downstream
aggregate_id: order.id, // partition key -> preserves per-order ordering
type: "order.placed",
payload: order,
created_at: new Date(),
});
}); // both rows commit atomically, or neither does
// Separate relay (own process / cron / worker):
// SELECT * FROM outbox WHERE sent_at IS NULL ORDER BY created_at LIMIT 100;
// publish(row); UPDATE outbox SET sent_at = now() WHERE id = row.id;Log-based CDC: the outbox without the poll
Polling the outbox works but adds latency and load. The more elegant relay reads the database's own write-ahead log — the durable, totally-ordered record of every committed change the database already maintains for crash recovery and replication. This is Change Data Capture (CDC); Debezium is the common implementation, tailing the Postgres WAL (logical replication) or MySQL binlog and emitting each committed change as an event.
Because the WAL only contains committed transactions in commit order, CDC inherits atomicity and ordering for free: you cannot capture a change that was rolled back, and you see changes in the exact order they committed. Pair it with the outbox table (the 'outbox + CDC' pattern) and you get clean, intentional event payloads rather than raw table diffs — with no polling and near-real-time latency.
Reading the replication log instead of polling is the same instinct behind Postgres logical replication and read replicas: the log is the source of truth, and everything downstream is a deterministic projection of it.
The other half nobody writes: idempotent consumers
The outbox guarantees at-least-once delivery. It says nothing about your consumer processing the same event twice — which it will, eventually, on a redelivery after a crash between 'do the work' and 'commit the offset'. If your handler is not idempotent, at-least-once delivery is just an at-least-once bug generator.
Make the effect idempotent, not the delivery. Two robust techniques: dedup on the event's unique id (the 'inbox' pattern — record processed ids in a table and skip duplicates, in the same transaction as the side effect), or design the write itself to be naturally idempotent (an upsert keyed on a business identity, or a state transition that is a no-op when already applied).
BEGIN;
-- INSERT fails on the unique(event_id) if we've seen it -> skip the effect
INSERT INTO processed_events (event_id) VALUES ($1)
ON CONFLICT (event_id) DO NOTHING;
-- only apply the side effect when the insert actually happened
-- (rowcount = 1), so a duplicate delivery is a clean no-op
UPDATE accounts SET balance = balance + $2 WHERE id = $3;
COMMIT;The one-paragraph version to hand a junior
Never write to your database and a message broker in two separate steps and hope both happen. Write the event into an outbox table inside the same transaction as the data, relay it to the broker at-least-once (poll with SKIP LOCKED, or tail the WAL with CDC), partition by entity id to keep ordering, and make every consumer idempotent by deduping on the event id. That is the whole discipline — and it is the difference between an event-driven system that is eventually consistent and one that is eventually wrong.
Prefer this done for you? Explore the five practices we run for clients — search, engineering, operations, data, and growth.