Data Engineering

Data Pipelines with Apache Airflow

Online

On-site

Hybrid

Data Pipelines with Apache Airflow

Build a strong foundation in orchestrating production-grade data pipelines using Apache Airflow, from DAG fundamentals to operational best practices. Learn how to schedule reliable workflows, handle retries and backfills, debug failures, implement validations, and run pipelines with monitoring and alerting patterns.

Duration:

3 days

Level:

Intermediate

Get a Quote

1500+ users onboarded

Who will Benefit from this Training?

Data Engineers
Analytics Engineers
Data Platform Engineers
Data Ops teams
Backend engineers working with data workflows
BI engineers supporting scheduled reporting pipelines

Training Objectives

Understand why orchestration is essential in modern Data Engineering.
Build, schedule, and monitor data pipelines using Apache Airflow.
Understand Airflow concepts including DAGs, tasks, operators, scheduling, retries, and SLAs.
Implement production-ready workflow best practices such as idempotency, retries, backfill, task dependencies, and timeouts.
Integrate Airflow with Python ETL scripts, SQL transformations, and dbt workflows (starter).
Manage Airflow connections, variables, and secrets safely.
Implement data quality checks and failure alerting strategies.
Handle operational workflows including reprocessing, partial reruns, and failure recovery.
Build an end-to-end orchestrated data pipeline as a capstone project.

Build a high-performing, job-ready tech team.

Personalise your team’s upskilling roadmap and design a befitting, hands-on training program with Uptut

get started

Key training modules

Comprehensive, hands-on modules designed to take you from basics to advanced concepts

Module 1: Why Orchestration Matters in Modern Data Engineering
1. What orchestration solves (reliability, sequencing, automation)
2. Pipelines vs workflows (tasks, dependencies, retry behavior)
3. Batch scheduling challenges (late data, failures, reprocessing)
4. Airflow’s role in modern data stacks (ETL/ELT + observability)
5. Hands-on: Activity: Break down a real data pipeline into tasks and dependencies
Module 2: Apache Airflow Fundamentals (Core Concepts)
1. Airflow architecture overview (scheduler, webserver, workers, metadata DB)
2. Core concepts (DAGs, tasks, operators, task instances)
3. Scheduling basics (start_date, schedule_interval, catchup)
4. Retries, timeouts, SLAs, and failure behavior
5. Hands-on: Lab: Create your first DAG and validate it runs successfully
Module 3: Building Pipelines with Operators and Dependencies
1. Operator types (PythonOperator, BashOperator, SQL operators overview)
2. Task dependencies (linear, fan-out/fan-in, branching concepts)
3. Trigger rules and common patterns (all_success vs all_done)
4. Task grouping basics (TaskGroup intro)
5. Hands-on: Lab: Build a DAG with multiple tasks, dependencies, and parallel branches
Module 4: Scheduling, Monitoring, and Operational Visibility
1. DAG scheduling patterns (hourly/daily/cron)
2. Monitoring DAG runs and task runs in the UI
3. Logs, retries, and debugging failures
4. SLA monitoring and detecting pipeline freshness issues
5. Hands-on: Lab: Schedule a DAG, simulate failure, and validate retries + SLA behavior
Module 5: Production Workflow Best Practices (Idempotency, Backfill, Timeouts)
1. Idempotency patterns (safe re-runs, overwrite vs append decisions)
2. Backfill strategy and catchup behavior
3. Timeouts, retries, and exponential backoff concepts
4. Designing dependency chains that avoid cascading failures
5. Hands-on: Lab: Implement an idempotent DAG with a backfill scenario and validate reprocessing safety
Module 6: Integrating Airflow with Python ETL, SQL, and dbt (Starter)
1. Calling Python ETL scripts from Airflow
2. Running SQL transformations and incremental patterns (starter)
3. dbt integration concepts (dbt run, dbt test via operators)
4. Passing parameters (run_date, env, paths) into tasks
5. Hands-on: Lab: Build a DAG that runs Python ETL → SQL transforms → dbt models + tests
Module 7: Connections, Variables, and Secrets (Safe Operations)
1. Airflow Connections (databases, APIs, cloud services)
2. Variables for configuration and dynamic pipelines
3. Secrets handling best practices (no hardcoding, secret backend concepts)
4. Environment separation patterns (dev/stage/prod)
5. Hands-on: Lab: Configure a DB connection + variables and run a DAG using secure values
Module 8: Data Quality Checks and Failure Alerting
1. Data quality check patterns (null checks, duplicates, row counts, ranges)
2. Fail-fast vs warn-only approach
3. Alerting concepts (email/Slack/webhooks overview)
4. Adding runbook-ready context to failure alerts
5. Hands-on: Lab: Add data quality tasks and trigger alerts on validation failures
Module 9: Operational Workflows (Reprocessing, Partial Reruns, Recovery)
1. Clearing tasks safely and rerunning only failed steps
2. Reprocessing patterns (rebuild specific partition/date)
3. Handling partial failures and dependencies
4. Failure recovery checklist for production pipelines
5. Hands-on: Lab: Perform partial rerun recovery and validate pipeline correctness after fix
Module 10: Capstone Project (End-to-End Orchestrated Data Pipeline)
1. Capstone goal: Build a production-style orchestrated pipeline
2. Ingest data (Python ETL) with parameters (run_date)
3. Transform using SQL and dbt models
4. Add data quality checks and alerts
5. Support retries, backfill, and safe reruns
6. Hands-on: Capstone Lab: Deliver the working Airflow DAG with evidence, logs, and a short runbook

Hands-on Experience with Tools

No items found.

Training Delivery Format

Flexible, comprehensive training designed to fit your schedule and learning preferences

Opt-in Certifications

AWS, Scrum.org, DASA & more

100% Live

on-site/online training

Hands-on

Labs and capstone projects

Lifetime Access

to training material and sessions

How Does Personalised Training Work?

get started

Skill-Gap Assessment

Analysing skill gap and assessing business requirements to craft a unique program

1

Personalisation

Customising curriculum and projects to prepare your team for challenges within your industry

2

Implementation

Supplementing training with consulting support to ensure implementation in real projects

3

Why Data Pipelines with Apache Airflow for your business?

Reliable orchestration: Schedule and manage complex pipelines with clear dependencies.
Improved visibility: Track pipeline status, failures, retries, and SLA performance in one place.
Faster recovery: Built-in retries and alerting reduce downtime when jobs fail.
Scalable automation: Orchestrate batch workflows across warehouses, lakes, and multi-cloud systems.
Better governance: Standardize workflow execution with versioned DAGs and audit-friendly operations.

Lead the Digital Landscape with Cutting-Edge Tech and In-House " Techsperts "

Discover the power of digital transformation with train-to-deliver programs from Uptut's experts. Backed by 50,000+ professionals across the world's leading tech innovators.

GET STARTED

Frequently Asked Questions

1. What are the pre-requisites for this training?

The training does not require you to have prior skills or experience. The curriculum covers basics and progresses towards advanced topics.

2. Will my team get any practical experience with this training?

With our focus on experiential learning, we have made the training as hands-on as possible with assignments, quizzes and capstone projects, and a lab where trainees will learn by doing tasks live.

3. What is your mode of delivery - online or on-site?

We conduct both online and on-site training sessions. You can choose any according to the convenience of your team.

4. Will trainees get certified?

Yes, all trainees will get certificates issued by Uptut under the guidance of industry experts.

5. What do we do if we need further support after the training?

We have an incredible team of mentors that are available for consultations in case your team needs further assistance. Our experienced team of mentors is ready to guide your team and resolve their queries to utilize the training in the best possible way. Just book a consultation to get support.

Data Pipelines with Apache Airflow

Who will Benefit from this Training?

Training Objectives

Build a high-performing, job-ready tech team.

Key training modules

Hands-on Experience with Tools