Data Engineering

Data Engineering Fundamentals

Online

On-site

Hybrid

Data Engineering Fundamentals

Build a strong foundation in Data Engineering fundamentals, from ingestion and transformation to orchestration and analytics-ready serving. Learn how to design reliable pipelines using SQL, Python, dbt, and Airflow with production practices like incremental loads, data validation, and operational workflows.

Duration:

3 days

Level:

Beginner

Get a Quote

1500+ users onboarded

Who will Benefit from this Training?

Beginner Data Engineers
Software Engineers transitioning to Data Engineering
Data Analysts moving into engineering
DevOps engineers supporting data platforms
BI professionals exploring modern data workflows
Engineering students and fresh graduates

Training Objectives

Understand the role of a Data Engineer and the end-to-end data lifecycle (ingestion, storage, transformation, orchestration, serving).
Explain core data architecture concepts including OLTP vs OLAP and warehouse vs lake vs lakehouse.
Build foundational and intermediate SQL skills for analytics and pipeline logic (joins, aggregations, windows, incremental patterns).
Understand batch vs streaming pipelines and where each is used in real systems.
Learn data modeling fundamentals including star schema design and fact/dimension tables.
Implement basic ingestion workflows for CSV/JSON into staging tables.
Build a Python ETL script to clean, validate, and load data into PostgreSQL.
Create dbt models for transformations and add dbt tests for quality validation.
Orchestrate an end-to-end workflow using an Airflow DAG with retries and visibility.
Deliver a mini capstone project that demonstrates a working modern data pipeline with quality checks.

Build a high-performing, job-ready tech team.

Personalise your team’s upskilling roadmap and design a befitting, hands-on training program with Uptut

get started

Key training modules

Comprehensive, hands-on modules designed to take you from basics to advanced concepts

Module 1: Data Engineering Foundations (Role and Data Lifecycle)
1. What a Data Engineer does in modern organizations
2. End-to-end data lifecycle (ingestion, storage, transformation, orchestration, serving)
3. Key pipeline requirements (reliability, scalability, data quality, governance basics)
4. Common data platform components and how they connect
5. Hands-on: Activity: Map a real business use case into the data lifecycle stages
Module 2: Core Data Architecture Concepts (OLTP vs OLAP, Warehouse vs Lake vs Lakehouse)
1. OLTP vs OLAP and how workloads differ
2. Warehouse vs data lake vs lakehouse (trade-offs and best-fit use cases)
3. Batch vs ELT vs ETL patterns in practice
4. Common reference architectures for modern analytics platforms
5. Hands-on: Activity: Choose the right architecture for 3 scenarios (BI reporting, ML feature store, event analytics)
Module 3: SQL Fundamentals for Analytics and Pipelines
1. Core SQL querying (select, where, group by, order by)
2. Joins (inner/left/right/full) and practical join patterns
3. Aggregations and basic KPI computation
4. CTEs for readable transformations and pipeline logic
5. Hands-on: Lab: Write SQL queries to compute daily revenue and top customers from raw tables
Module 4: Advanced SQL (Windows + Incremental Patterns)
1. Window functions (row_number, rank, lag/lead)
2. Deduplication patterns using windows
3. Incremental logic patterns (max timestamp watermark, upsert concepts)
4. Building repeatable analytical datasets using SQL
5. Hands-on: Lab: Implement a deduped “latest record” table and an incremental load query using watermark logic
Module 5: Batch vs Streaming Pipelines (Real-World Usage)
1. Batch pipelines (scheduled processing, backfills)
2. Streaming pipelines (near real-time analytics, event processing)
3. Choosing between batch and streaming (SLA, cost, complexity)
4. Hybrid patterns (micro-batch, lambda/kappa concepts)
5. Hands-on: Activity: Classify pipeline requirements and decide batch vs streaming for each
Module 6: Data Modeling Fundamentals (Star Schema, Facts, Dimensions)
1. Why data modeling matters for analytics performance and clarity
2. Star schema basics (fact tables vs dimension tables)
3. Grain definition and avoiding common modeling mistakes
4. Designing facts/dims for e-commerce style analytics
5. Hands-on: Workshop: Design a star schema (orders fact + customer/product dimensions)
Module 7: Basic Ingestion Workflows (CSV/JSON to Staging)
1. Staging layer purpose and raw-to-staging patterns
2. Ingesting CSV and JSON (schema mapping and type casting basics)
3. Handling bad records and corrupt rows patterns
4. Loading data into PostgreSQL staging tables
5. Hands-on: Lab: Ingest CSV and JSON files into staging tables with basic validation
Module 8: Python ETL Script (Clean, Validate, Load into PostgreSQL)
1. Python ETL structure (extract, transform, load) with reusable functions
2. Data cleaning patterns (null handling, type casting, standardizing dates)
3. Validation checks (schema, nulls, duplicates, range checks)
4. PostgreSQL loading patterns (bulk load, transactions, retry basics)
5. Hands-on: Lab: Build a Python ETL script that cleans and loads data into PostgreSQL with validations
Module 9: Transformations with dbt (Models + Tests)
1. dbt fundamentals (models, sources, refs)
2. Building transformation layers (staging → intermediate → marts)
3. dbt tests (not null, unique, relationships) for data quality
4. Incremental dbt model concepts (overview)
5. Hands-on: Lab: Create dbt models for curated tables and add dbt tests for quality validation
Module 10: Orchestration with Airflow (DAGs + Retries + Visibility)
1. Airflow concepts (DAGs, tasks, operators, scheduling)
2. Retries, timeouts, and failure handling patterns
3. Dependency chaining (ingestion → transform → validate → publish)
4. Observability (task logs, retries, run history)
5. Hands-on: Lab: Build an Airflow DAG that orchestrates ingestion, dbt transforms, and quality checks
Module 11: Mini Capstone (Modern Data Pipeline with Quality Checks)
1. Capstone goal: Deliver a working end-to-end pipeline
2. Ingest CSV/JSON into PostgreSQL staging
3. Transform using dbt models (facts/dims and reporting tables)
4. Validate with dbt tests and additional Python checks (optional)
5. Orchestrate using Airflow with retries and clear run visibility
6. Hands-on: Capstone Lab: Deliver pipeline, demo DAG runs, and submit documentation + query outputs

Hands-on Experience with Tools

No items found.

Training Delivery Format

Flexible, comprehensive training designed to fit your schedule and learning preferences

Opt-in Certifications

AWS, Scrum.org, DASA & more

100% Live

on-site/online training

Hands-on

Labs and capstone projects

Lifetime Access

to training material and sessions

How Does Personalised Training Work?

get started

Skill-Gap Assessment

Analysing skill gap and assessing business requirements to craft a unique program

1

Personalisation

Customising curriculum and projects to prepare your team for challenges within your industry

2

Implementation

Supplementing training with consulting support to ensure implementation in real projects

3

Why Data Engineering Fundamentals for your business?

Reliable decision-making: Build trusted datasets that power reporting and analytics.
Faster data access: Standardize ingestion and transformation to reduce time-to-insight.
Improved data quality: Implement validation, governance, and consistency across pipelines.
Scalable data operations: Design data systems that grow with volume and business complexity.
Better AI readiness: Strong foundations enable ML and GenAI initiatives to succeed.

Lead the Digital Landscape with Cutting-Edge Tech and In-House " Techsperts "

Discover the power of digital transformation with train-to-deliver programs from Uptut's experts. Backed by 50,000+ professionals across the world's leading tech innovators.

GET STARTED

Frequently Asked Questions

1. What are the pre-requisites for this training?

The training does not require you to have prior skills or experience. The curriculum covers basics and progresses towards advanced topics.

2. Will my team get any practical experience with this training?

With our focus on experiential learning, we have made the training as hands-on as possible with assignments, quizzes and capstone projects, and a lab where trainees will learn by doing tasks live.

3. What is your mode of delivery - online or on-site?

We conduct both online and on-site training sessions. You can choose any according to the convenience of your team.

4. Will trainees get certified?

Yes, all trainees will get certificates issued by Uptut under the guidance of industry experts.

5. What do we do if we need further support after the training?

We have an incredible team of mentors that are available for consultations in case your team needs further assistance. Our experienced team of mentors is ready to guide your team and resolve their queries to utilize the training in the best possible way. Just book a consultation to get support.

Data Engineering Fundamentals

Who will Benefit from this Training?

Training Objectives

Build a high-performing, job-ready tech team.

Key training modules

Hands-on Experience with Tools