Data Engineering

Data Engineering on AWS

Online

On-site

Hybrid

Data Engineering on AWS

Build a strong foundation in building modern data platforms on AWS, from data lake design to batch and streaming pipelines. Learn how to use S3, Glue, Athena, Redshift, and orchestration patterns to deliver secure, scalable, observable pipelines aligned with real enterprise data engineering practices.

Duration:

5 days

Level:

Intermediate

Get a Quote

1500+ users onboarded

Who will Benefit from this Training?

Data Engineers
Analytics Engineers
Cloud Engineers supporting data platforms
Data Platform Engineers
DevOps Engineers working with AWS data systems
BI Engineers who want deeper engineering capability

Training Objectives

Understand modern data engineering architectures on AWS for batch and streaming workloads.
Design end-to-end AWS data pipelines for ingestion, storage, transformation, orchestration, and serving.
Ingest data using batch, event-based patterns, and CDC concepts.
Build scalable storage layers using Amazon S3 with partitioning and columnar formats like Parquet.
Transform and query data using AWS Glue, Amazon Athena, and Amazon Redshift.
Orchestrate workflows using Amazon MWAA (Managed Airflow) and understand Step Functions (overview).
Build real-time streaming pipelines using Amazon Kinesis Data Streams and Firehose.
Implement governance and security practices using IAM, KMS encryption, and Lake Formation concepts.
Apply data quality and observability practices for reliable pipeline operations.
Deliver a complete end-to-end AWS data engineering capstone project.

Build a high-performing, job-ready tech team.

Personalise your team’s upskilling roadmap and design a befitting, hands-on training program with Uptut

get started

Key training modules

Comprehensive, hands-on modules designed to take you from basics to advanced concepts

Module 1: AWS Data Engineering Architecture (Batch and Streaming)
1. Modern AWS data platform reference architectures (lake, warehouse, lakehouse concepts)
2. Batch vs streaming workloads and how AWS services map to each
3. End-to-end pipeline stages (ingestion, storage, transform, orchestration, serving)
4. Choosing services (S3, Glue, Athena, Redshift, Kinesis, MWAA)
5. Hands-on: Activity: Design an AWS reference architecture for an analytics and streaming use case
Module 2: Designing End-to-End AWS Data Pipelines (Ingest → Store → Transform → Serve)
1. Pipeline design patterns (raw → cleansed → curated → marts)
2. Data contracts, schema evolution concepts, and partition strategy planning
3. Operational requirements (SLA, backfills, idempotency basics)
4. Cost and performance drivers across storage and query layers
5. Hands-on: Workshop: Create an end-to-end pipeline blueprint with data zones, partitions, and SLAs
Module 3: Ingestion Patterns on AWS (Batch, Event, CDC Concepts)
1. Batch ingestion patterns to S3 (scheduled loads, file drops)
2. Event-based ingestion concepts (S3 events, pub/sub patterns overview)
3. CDC concepts (insert/update/delete capture and downstream processing)
4. Idempotency patterns for ingestion pipelines
5. Hands-on: Lab: Implement a batch ingestion workflow into S3 raw zone with partitioned folder layout
Module 4: Scalable Storage on S3 (Partitioning + Parquet)
1. S3 as a data lake foundation (buckets, prefixes, lifecycle basics)
2. Partitioning strategies (by date, region, tenant) and common pitfalls
3. Columnar formats (Parquet) and why they improve query performance
4. File sizing and small files problem (compaction concept)
5. Hands-on: Lab: Convert raw CSV/JSON into partitioned Parquet in S3 and validate folder structure
Module 5: Transform and Query with AWS Glue (ETL Core)
1. Glue jobs overview (Spark ETL, DynamicFrames, DataFrames)
2. Glue Data Catalog usage (databases, tables, crawlers, partitions)
3. ETL patterns (cleanse, dedup, join, aggregate) into curated zone
4. Incremental processing concepts (partition loads, watermark ideas)
5. Hands-on: Lab: Build a Glue ETL job to transform raw data into curated Parquet/Delta-style layout (Parquet-based)
Module 6: Analytics with Athena (Querying the Lake)
1. Athena fundamentals (serverless SQL on S3)
2. Partition pruning and performance best practices
3. Using Glue Catalog tables with Athena
4. Building views for reporting and BI consumption
5. Hands-on: Lab: Query curated S3 datasets using Athena and build analytics views
Module 7: Serving Layer with Amazon Redshift (Warehouse Overview + Integration)
1. When to use Redshift vs Athena (workload and performance trade-offs)
2. Redshift basics (schemas, distribution/sort keys concept)
3. Loading curated data into Redshift (ELT patterns overview)
4. Warehouse modeling basics (facts/dimensions for analytics)
5. Hands-on: Lab: Load curated datasets into Redshift and run KPI queries (revenue, top customers)
Module 8: Orchestration with Amazon MWAA (Managed Airflow) + Step Functions Overview
1. MWAA fundamentals (DAGs, tasks, scheduling)
2. Orchestrating ingestion → Glue → Athena/Redshift workflow
3. Retries, alerts, and operational visibility in Airflow
4. Step Functions overview (when to use vs Airflow)
5. Hands-on: Lab: Build an MWAA DAG to run an end-to-end batch pipeline with retries and monitoring
Module 9: Streaming Pipelines with Kinesis Data Streams and Firehose
1. Streaming fundamentals (events, partitions/shards, consumer groups concept)
2. Kinesis Data Streams architecture (producers, shards, consumers)
3. Kinesis Firehose delivery patterns (to S3/Redshift destinations concept)
4. Streaming-to-lake pattern (real-time landing in S3 + batch transforms)
5. Hands-on: Lab: Build a simple streaming pipeline using Kinesis → Firehose → S3 and query results
Module 10: Governance and Security (IAM, KMS, Lake Formation Concepts)
1. IAM for data platforms (least privilege access to S3/Glue/Athena/Redshift)
2. KMS encryption concepts (at rest and in transit awareness)
3. Lake Formation concepts (centralized permissions, data access governance)
4. Secure data sharing and access patterns for teams
5. Hands-on: Lab: Apply least-privilege IAM policies and validate encrypted access paths
Module 11: Data Quality and Observability (Reliable Operations)
1. Data quality checks (schema, nulls, duplicates, ranges)
2. Pipeline observability (logs, metrics, run history, SLA/freshness checks)
3. Alerting and failure handling (retries, DLQ concepts for streaming)
4. Operational runbooks and incident readiness basics
5. Hands-on: Lab: Add validation checks to batch pipeline and build a simple quality report + alerts
Module 12: Capstone Project (End-to-End AWS Data Engineering)
1. Capstone goal: Build an AWS end-to-end data pipeline (batch + optional streaming)
2. Ingest raw data to S3 (partitioned), transform with Glue, query with Athena
3. Orchestrate with MWAA and publish curated datasets
4. Apply governance/security (IAM/KMS) and add quality checks + observability
5. Hands-on: Capstone Lab: Deliver architecture diagram, pipeline code, orchestration DAG, and KPI query outputs

Hands-on Experience with Tools

No items found.

Training Delivery Format

Flexible, comprehensive training designed to fit your schedule and learning preferences

Opt-in Certifications

AWS, Scrum.org, DASA & more

100% Live

on-site/online training

Hands-on

Labs and capstone projects

Lifetime Access

to training material and sessions

How Does Personalised Training Work?

get started

Skill-Gap Assessment

Analysing skill gap and assessing business requirements to craft a unique program

1

Personalisation

Customising curriculum and projects to prepare your team for challenges within your industry

2

Implementation

Supplementing training with consulting support to ensure implementation in real projects

3

Why Data Engineering on AWS for your business?

Faster cloud adoption: Build scalable data platforms using managed AWS services.
Cost-effective scalability: Pay-as-you-go infrastructure supports growth without upfront investment.
Better security controls: Use IAM, encryption, and governance tools to protect data assets.
Improved reliability: Leverage AWS-native monitoring and resilient architectures.
Accelerated time-to-insight: Deliver analytics faster with integrated services like S3, Glue, Athena, and Redshift.

Lead the Digital Landscape with Cutting-Edge Tech and In-House " Techsperts "

Discover the power of digital transformation with train-to-deliver programs from Uptut's experts. Backed by 50,000+ professionals across the world's leading tech innovators.

GET STARTED

Frequently Asked Questions

1. What are the pre-requisites for this training?

The training does not require you to have prior skills or experience. The curriculum covers basics and progresses towards advanced topics.

2. Will my team get any practical experience with this training?

With our focus on experiential learning, we have made the training as hands-on as possible with assignments, quizzes and capstone projects, and a lab where trainees will learn by doing tasks live.

3. What is your mode of delivery - online or on-site?

We conduct both online and on-site training sessions. You can choose any according to the convenience of your team.

4. Will trainees get certified?

Yes, all trainees will get certificates issued by Uptut under the guidance of industry experts.

5. What do we do if we need further support after the training?

We have an incredible team of mentors that are available for consultations in case your team needs further assistance. Our experienced team of mentors is ready to guide your team and resolve their queries to utilize the training in the best possible way. Just book a consultation to get support.

Data Engineering on AWS

Who will Benefit from this Training?

Training Objectives

Build a high-performing, job-ready tech team.

Key training modules

Hands-on Experience with Tools