Jaygovind Sahu
Senior Data Engineer
Senior Data Engineer with 16 years of proven expertise architecting enterprise data platforms that drive measurable business impact across financial services, retail, and streaming media. At Amazon, spearheaded the development of a marketing analytics platform delivering a 30% reduction in campaign costs within the first year, with subsequent ML-driven enhancements projected to achieve an additional 25% cost savings. At Jornaya, completely re-architected a high-scale data pipeline serving 50 vendors, simultaneously reducing AWS infrastructure costs by 55% while processing one billion records in under 30 minutes. Consistently translates complex data infrastructure into strategic business value, empowering leadership teams to make faster, data-driven decisions with confidence.
Experience
Data Engineer
Netflix
Jul 2025 – Present
- Building something awesome.
Data Engineer
Amazon
Mar 2022 – Jul 2025
- Developed a robust data platform to measure and enhance the efficiency of marketing campaigns across multiple channels, including offline (billboards, newspapers, gas station TVs), social media (Facebook, Instagram), and search engines (Google, Bing). The pipeline ingested and analyzed data from various vendors to generate key metrics such as cost per click, cost per application, impressions, and GRP — leading to a 30% reduction in marketing costs in one year after launch.
- Built a data pipeline using Airflow to standardize addresses for millions of records via an address standardization API, benefiting multiple pipelines across domains. Automated process replaced a manual one, reducing standardization time by ~90% and eliminating manual errors.
- Partnered with a third-party data provider to integrate hundreds of millions of sensitive records into the data lake. Designed and implemented a pipeline to transform and prepare the dataset for machine learning, and developed a reusable framework for running regression and classification models — expected to reduce marketing costs by an additional 25%.
Data Engineer
Vanguard
Apr 2021 – Mar 2022
- Implemented and maintained data pipelines using AWS EMR, Glue, and other AWS products for big data analytics, enabling successful functioning of ~15 Tableau dashboards used by financial and business analysts for strategic decision-making by leadership teams.
Data Engineer
Jornaya (Verisk Marketing Solutions)
Dec 2019 – Apr 2021
- Led development and enhancement of a core product requiring processing of billions of records from clients and building reports for users — using ~70 EMR clusters, ~10 Glue jobs, and AWS Lambda functions for serverless architecture.
- Optimized Apache Spark and AWS EMR configurations for a data pipeline serving ~50 vendors, saving more than 55% in AWS costs in one year while lowering the failure rate by ~90%, resulting in faster and more resilient delivery of results.
- Designed and implemented a data pipeline to asynchronously call an external API endpoint, transform the response, and write datasets to Amazon S3 — using AWS SNS (queuing), Lambda (API calls with multi-processing and transformation), and Step Functions (orchestration). The pipeline processed 1 billion records in ~25 minutes, enabling timely delivery of critical client reports.
Data Developer
Tata Consultancy Services
Feb 2010 – Dec 2019
- Built data solutions for 2 major clients in the banking and financial domain, contributing to more than 15 successful projects and numerous ad-hoc analyses for business users. Developed data pipelines across technologies ranging from legacy IBM Mainframes (COBOL, JCL, DB2) to modern stacks using Python, Apache Spark, and AWS products.
Projects
The Data Domain Blog
A personal blog that curates news and articles in Data Engineering, Data Analytics, and AI/ML from various sources and RSS feeds. The backend uses Airflow to automate content discovery, retrieval, and summarization using LLMs from OpenAI and Anthropic. WordPress APIs are used to auto-publish articles with concise summaries and links to originals.
Data Engineer's Guide
Dataguide.dev is an open-source knowledge base providing high-level technical resources for the data engineering community. It features curated guides on system design, data modeling, and interview preparation. Built to eliminate paywalls, the platform offers deep dives into modern architectures like Lakehouse and CDC, supporting engineers in career advancement
Education
Veer Surendra Sai University of Technology
Bachelor of Technology, Electrical Engineering
2005 – 2009
Skills
Certifications
- AWS Certified Data Analytics – SpecialtyJul 2023