← All posts

Idempotency patterns for Kinesis + Lambda ingestion

Draft — review and edit before publishing. Adjust details to match what you can share publicly.

Streaming ingestion for financial data has one non-negotiable property: a retried event must never be counted twice. Kinesis and Lambda give you at-least-once delivery, which means duplicates aren't an edge case — they're the contract. Idempotency is your job.

Where duplicates come from

  • Lambda retries after a partial failure (some records written, then a timeout).
  • Kinesis consumer restarts re-reading from the last checkpoint.
  • Upstream producers re-sending on ambiguous acknowledgments.
  • Humans replaying a window after an incident — the most common source in practice.

Patterns that worked

Deterministic event keys

Every event gets an identity derived from its business content (source, entity, timestamp, sequence), not from a random UUID assigned at ingestion. Two deliveries of the same event must produce the same key, or nothing downstream can deduplicate.

Idempotent writes, not dedup-on-read

Deduplicating at query time pushes the problem onto every consumer forever. Instead, make the write path a no-op for keys it has already seen — merge/upsert semantics at the storage layer.

Design for replay as a feature

If reprocessing a full day of events is safe, incidents become boring. We treated "replay the window" as the standard recovery action, which is only possible when every write in the path is idempotent.

The test that matters

The check I trust most is brutally simple: run the same input batch through the pipeline twice and diff the output. If anything changes on the second pass, the pipeline isn't idempotent — no matter what the design doc says.