Idempotency patterns for Kinesis + Lambda ingestion
Draft — review and edit before publishing. Adjust details to match what you can share publicly.
Streaming ingestion for financial data has one non-negotiable property: a retried event must never be counted twice. Kinesis and Lambda give you at-least-once delivery, which means duplicates aren't an edge case — they're the contract. Idempotency is your job.
Where duplicates come from
- Lambda retries after a partial failure (some records written, then a timeout).
- Kinesis consumer restarts re-reading from the last checkpoint.
- Upstream producers re-sending on ambiguous acknowledgments.
- Humans replaying a window after an incident — the most common source in practice.
Patterns that worked
Deterministic event keys
Every event gets an identity derived from its business content (source, entity, timestamp, sequence), not from a random UUID assigned at ingestion. Two deliveries of the same event must produce the same key, or nothing downstream can deduplicate.
Idempotent writes, not dedup-on-read
Deduplicating at query time pushes the problem onto every consumer forever. Instead, make the write path a no-op for keys it has already seen — merge/upsert semantics at the storage layer.
Design for replay as a feature
If reprocessing a full day of events is safe, incidents become boring. We treated "replay the window" as the standard recovery action, which is only possible when every write in the path is idempotent.
The test that matters
The check I trust most is brutally simple: run the same input batch through the pipeline twice and diff the output. If anything changes on the second pass, the pipeline isn't idempotent — no matter what the design doc says.