Online
On-site
Hybrid

Data Pipelines with Apache Airflow

Build a strong foundation in orchestrating production-grade data pipelines using Apache Airflow, from DAG fundamentals to operational best practices. Learn how to schedule reliable workflows, handle retries and backfills, debug failures, implement validations, and run pipelines with monitoring and alerting patterns.

Duration:
3 days
Rating:
4.8/5.0
Level:
Intermediate
1500+ users onboarded

Who will Benefit from this Training?

  • Data Engineers
  • Analytics Engineers
  • Data Platform Engineers
  • Data Ops teams
  • Backend engineers working with data workflows
  • BI engineers supporting scheduled reporting pipelines

Training Objectives

  • Understand why orchestration is essential in modern Data Engineering.
  • Build, schedule, and monitor data pipelines using Apache Airflow.
  • Understand Airflow concepts including DAGs, tasks, operators, scheduling, retries, and SLAs.
  • Implement production-ready workflow best practices such as idempotency, retries, backfill, task dependencies, and timeouts.
  • Integrate Airflow with Python ETL scripts, SQL transformations, and dbt workflows (starter).
  • Manage Airflow connections, variables, and secrets safely.
  • Implement data quality checks and failure alerting strategies.
  • Handle operational workflows including reprocessing, partial reruns, and failure recovery.
  • Build an end-to-end orchestrated data pipeline as a capstone project.

Build a high-performing, job-ready tech team.

Personalise your team’s upskilling roadmap and design a befitting, hands-on training program with Uptut

Key training modules

Comprehensive, hands-on modules designed to take you from basics to advanced concepts
Download Curriculum
  • Module 1: Why Orchestration Matters in Modern Data Engineering
    1. What orchestration solves (reliability, sequencing, automation)
    2. Pipelines vs workflows (tasks, dependencies, retry behavior)
    3. Batch scheduling challenges (late data, failures, reprocessing)
    4. Airflow’s role in modern data stacks (ETL/ELT + observability)
    5. Hands-on: Activity: Break down a real data pipeline into tasks and dependencies
  • Module 2: Apache Airflow Fundamentals (Core Concepts)
    1. Airflow architecture overview (scheduler, webserver, workers, metadata DB)
    2. Core concepts (DAGs, tasks, operators, task instances)
    3. Scheduling basics (start_date, schedule_interval, catchup)
    4. Retries, timeouts, SLAs, and failure behavior
    5. Hands-on: Lab: Create your first DAG and validate it runs successfully
  • Module 3: Building Pipelines with Operators and Dependencies
    1. Operator types (PythonOperator, BashOperator, SQL operators overview)
    2. Task dependencies (linear, fan-out/fan-in, branching concepts)
    3. Trigger rules and common patterns (all_success vs all_done)
    4. Task grouping basics (TaskGroup intro)
    5. Hands-on: Lab: Build a DAG with multiple tasks, dependencies, and parallel branches
  • Module 4: Scheduling, Monitoring, and Operational Visibility
    1. DAG scheduling patterns (hourly/daily/cron)
    2. Monitoring DAG runs and task runs in the UI
    3. Logs, retries, and debugging failures
    4. SLA monitoring and detecting pipeline freshness issues
    5. Hands-on: Lab: Schedule a DAG, simulate failure, and validate retries + SLA behavior
  • Module 5: Production Workflow Best Practices (Idempotency, Backfill, Timeouts)
    1. Idempotency patterns (safe re-runs, overwrite vs append decisions)
    2. Backfill strategy and catchup behavior
    3. Timeouts, retries, and exponential backoff concepts
    4. Designing dependency chains that avoid cascading failures
    5. Hands-on: Lab: Implement an idempotent DAG with a backfill scenario and validate reprocessing safety
  • Module 6: Integrating Airflow with Python ETL, SQL, and dbt (Starter)
    1. Calling Python ETL scripts from Airflow
    2. Running SQL transformations and incremental patterns (starter)
    3. dbt integration concepts (dbt run, dbt test via operators)
    4. Passing parameters (run_date, env, paths) into tasks
    5. Hands-on: Lab: Build a DAG that runs Python ETL → SQL transforms → dbt models + tests
  • Module 7: Connections, Variables, and Secrets (Safe Operations)
    1. Airflow Connections (databases, APIs, cloud services)
    2. Variables for configuration and dynamic pipelines
    3. Secrets handling best practices (no hardcoding, secret backend concepts)
    4. Environment separation patterns (dev/stage/prod)
    5. Hands-on: Lab: Configure a DB connection + variables and run a DAG using secure values
  • Module 8: Data Quality Checks and Failure Alerting
    1. Data quality check patterns (null checks, duplicates, row counts, ranges)
    2. Fail-fast vs warn-only approach
    3. Alerting concepts (email/Slack/webhooks overview)
    4. Adding runbook-ready context to failure alerts
    5. Hands-on: Lab: Add data quality tasks and trigger alerts on validation failures
  • Module 9: Operational Workflows (Reprocessing, Partial Reruns, Recovery)
    1. Clearing tasks safely and rerunning only failed steps
    2. Reprocessing patterns (rebuild specific partition/date)
    3. Handling partial failures and dependencies
    4. Failure recovery checklist for production pipelines
    5. Hands-on: Lab: Perform partial rerun recovery and validate pipeline correctness after fix
  • Module 10: Capstone Project (End-to-End Orchestrated Data Pipeline)
    1. Capstone goal: Build a production-style orchestrated pipeline
    2. Ingest data (Python ETL) with parameters (run_date)
    3. Transform using SQL and dbt models
    4. Add data quality checks and alerts
    5. Support retries, backfill, and safe reruns
    6. Hands-on: Capstone Lab: Deliver the working Airflow DAG with evidence, logs, and a short runbook

Hands-on Experience with Tools

No items found.
No items found.
No items found.

Training Delivery Format

Flexible, comprehensive training designed to fit your schedule and learning preferences
Opt-in Certifications
AWS, Scrum.org, DASA & more
100% Live
on-site/online training
Hands-on
Labs and capstone projects
Lifetime Access
to training material and sessions

How Does Personalised Training Work?

Skill-Gap Assessment

Analysing skill gap and assessing business requirements to craft a unique program

1

Personalisation

Customising curriculum and projects to prepare your team for challenges within your industry

2

Implementation

Supplementing training with consulting support to ensure implementation in real projects

3

Why Data Pipelines with Apache Airflow for your business?

  • Reliable orchestration: Schedule and manage complex pipelines with clear dependencies.
  • Improved visibility: Track pipeline status, failures, retries, and SLA performance in one place.
  • Faster recovery: Built-in retries and alerting reduce downtime when jobs fail.
  • Scalable automation: Orchestrate batch workflows across warehouses, lakes, and multi-cloud systems.
  • Better governance: Standardize workflow execution with versioned DAGs and audit-friendly operations.

Lead the Digital Landscape with Cutting-Edge Tech and In-House " Techsperts "

Discover the power of digital transformation with train-to-deliver programs from Uptut's experts. Backed by 50,000+ professionals across the world's leading tech innovators.

Frequently Asked Questions

1. What are the pre-requisites for this training?
Faq PlusFaq Minus

The training does not require you to have prior skills or experience. The curriculum covers basics and progresses towards advanced topics.

2. Will my team get any practical experience with this training?
Faq PlusFaq Minus

With our focus on experiential learning, we have made the training as hands-on as possible with assignments, quizzes and capstone projects, and a lab where trainees will learn by doing tasks live.

3. What is your mode of delivery - online or on-site?
Faq PlusFaq Minus

We conduct both online and on-site training sessions. You can choose any according to the convenience of your team.

4. Will trainees get certified?
Faq PlusFaq Minus

Yes, all trainees will get certificates issued by Uptut under the guidance of industry experts.

5. What do we do if we need further support after the training?
Faq PlusFaq Minus

We have an incredible team of mentors that are available for consultations in case your team needs further assistance. Our experienced team of mentors is ready to guide your team and resolve their queries to utilize the training in the best possible way. Just book a consultation to get support.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.