Senior Data Engineer · Seattle, WA

I build data platforms that thousands of people — and AI agents — rely on.

14 years in data engineering, 11 at Amazon across big data, fraud analytics, and AWS Finance. I design lakehouse platforms, global batch systems, and the GenAI layers that sit on top of them.

career_summary.sql
SELECT * FROM engineers WHERE name = 'Biswajit Praharaj';
years_experienceplatform_usersglue_clusterscountries_servedteams_on_cdk_frameworkinterviews_conducted
141,600+25+20+1233+
-- 1 row returned. Details below.

Case studies

AMPS PlatformAWS Finance
Iceberg · Spark · Glue
Redshift Spectrum

A medallion lakehouse serving 8 organizations

Designed and built a medallion-architecture lakehouse on Apache Iceberg serving 1,600+ users across 8 organizations, with row-level security so each team sees exactly the data it should — and nothing else.

Layered an embedded AI agent (Jarvis) on top of the semantic layer, letting finance users query 2,500+ datasets in plain language instead of writing SQL.

1,600+ users8 orgs2,500+ datasetsrow-level securityembedded AI agent
TCOL Month-End CloseAWS Finance
AWS Glue · Spark
Multi-region batch

Global month-end close across 20+ countries

Built the multi-cluster batch system behind month-end close: 25+ Glue clusters orchestrated across 20+ countries, where a missed SLA means a delayed financial close for the business.

The hard problems were operational, not computational — idempotent reruns, cross-region dependency ordering, and making failures diagnosable at 2 a.m. without paging the whole team.

25+ Glue clusters20+ countriesmonth-end close SLA
Ingestion & ToolingAWS Finance
Kinesis · Lambda · CDK

Streaming ingestion and a CDK framework 12 teams adopted

Built exactly-once-style streaming ingestion with Kinesis and Lambda, designed around idempotency so replays and retries never double-count financial events.

Packaged the team's infrastructure patterns into an AWS CDK framework that 12 other teams adopted — turning one team's hard-won conventions into an organization-wide standard.

idempotent ingestion12 teams adoptedinfra as code
StockGPTPersonal project
FastAPI · DuckDB
LLM agent loop

An LLM agent for equity research

A tool-using LLM agent over S&P 500 fundamentals: FastAPI backend, DuckDB store covering 503 tickers, an agent loop with 5 tools, and an eval harness to measure answer quality instead of guessing at it.

agent loopeval harness503 tickers
View on GitHub →

Timeline

2021 — Present · [fix dates]

Senior Data Engineer, AWS Finance

Amazon · Seattle, WA

Own data platform architecture for AWS Finance: the AMPS Iceberg lakehouse (1,600+ users, 8 orgs), TCOL month-end close batch across 20+ countries, and an embedded AI agent over the semantic layer. Built the CDK framework adopted by 12 teams.

IcebergSparkGlueRedshift SpectrumCDKGenAI
2018 — 2021 · [fix dates]

Data Engineer, Seller Support / CTPS

Amazon

Built analytics and data pipelines supporting seller trust and support operations, turning operational event streams into decision-ready datasets for global teams.

KinesisLambdaAirflowRedshift
2015 — 2018 · [fix dates]

Data Engineer, Transaction Risk Management

Amazon

Fraud and abuse analytics: built the data foundations that risk models and investigators depended on to detect bad actors across marketplace transactions.

Fraud analyticsSparkSQL
2015 · [fix dates]

Data Engineer, Big Data Technologies

Amazon

Started at Amazon in the Big Data Technologies org, building large-scale batch processing on Hadoop-era infrastructure as the company scaled its internal data platform.

HadoopHiveETL
2012 — 2015 · [fix dates]

[Title], [Company]

[Pre-Amazon role — fill in]

[Describe your first three years in data engineering before joining Amazon — company, what you built, and the stack.]

[tech]

Stack & strengths

Lakehouse & batch

Apache Iceberg, Spark, AWS Glue, Redshift Spectrum, medallion architecture, semantic layers.

Streaming

Kinesis, Lambda, idempotent ingestion patterns, exactly-once semantics for financial data.

Orchestration & infra

Airflow / MWAA, AWS CDK, multi-region batch orchestration, infrastructure as code.

GenAI on data

LLM agents over governed data, tool use, eval harnesses, natural-language access to semantic layers.

Domain depth

Fintech and FP&A: month-end close, journal entries, accruals, plus fraud & abuse analytics.

People

33+ technical interviews conducted at Amazon; mentoring, design reviews, and cross-team standards.

Notes from production

Latest: “Designing row-level security for a 1,600-user lakehouse” — lessons from systems that had to work, written for engineers who build them.

Read the blog →