Online
On-site
Hybrid

Data Engineering on Azure

Build a strong foundation in Azure Data Engineering, from lakehouse architecture to enterprise ingestion and transformation workflows. Learn how to build end-to-end pipelines using ADLS, Data Factory, Databricks, and Synapse while applying security, monitoring, and production-ready operational practices.

Duration:
5 days
Rating:
4.8/5.0
Level:
Intermediate
1500+ users onboarded

Who will Benefit from this Training?

  • Data Engineers
  • Analytics Engineers
  • Cloud Engineers supporting data platforms
  • Data Platform Engineers
  • DevOps engineers working with Azure data services
  • BI engineers moving into data engineering

Training Objectives

  • Understand modern Data Engineering architecture on Azure and service selection for workloads.
  • Build a complete data platform using Azure services across ingestion, storage, transformation, orchestration, and serving/analytics.
  • Design and implement a data lake foundation using Azure Data Lake Storage Gen2 (ADLS) with raw/cleansed/curated zones.
  • Build ingestion and orchestration pipelines using Azure Data Factory (ADF) with linked services, datasets, triggers, and monitoring.
  • Transform data using Azure Databricks with Spark fundamentals and Delta Lake patterns.
  • Implement batch and incremental pipelines including full load and watermark-based processing.
  • Build analytics datasets and data warehouse style tables and views for reporting.
  • Apply security and governance practices including RBAC, managed identities, Key Vault integration, and encryption.
  • Implement monitoring and operational practices including pipeline logs, failure handling, and alerting.
  • Deliver an end-to-end capstone project on Azure with validated outputs and documentation.

Build a high-performing, job-ready tech team.

Personalise your team’s upskilling roadmap and design a befitting, hands-on training program with Uptut

Key training modules

Comprehensive, hands-on modules designed to take you from basics to advanced concepts
Download Curriculum
  • Module 1: Azure Data Engineering Architecture Overview
    1. Data engineering lifecycle (ingest → store → transform → serve)
    2. Modern platform patterns (Data Lake, Data Warehouse, Lakehouse)
    3. Batch vs streaming overview
    4. Choosing Azure services per workload
    5. Hands-on: Activity: Design an Azure reference architecture for an analytics use case
  • Module 2: Azure Data Lake Storage Gen2 (ADLS) Deep Dive
    1. Storage account fundamentals
    2. Containers and folder strategy
    3. Lake zones (raw, cleansed, curated)
    4. Naming and partitioning best practices
    5. Hands-on: Lab: Create storage account, containers, and lake zone folder structure + upload raw files
  • Module 3: Data Formats and Lakehouse Fundamentals
    1. CSV vs JSON vs Parquet
    2. Why Parquet is preferred
    3. Delta Lake overview (ACID transactions, time travel concept)
    4. Partitioning strategy (by date, by region/customer)
    5. Hands-on: Lab/Exercise: Upload datasets + define partition folders and choose partition keys
  • Module 4: Azure Security Basics for Data Engineers
    1. RBAC basics for storage
    2. Managed Identity introduction
    3. Data access patterns (engineers vs analysts, least privilege)
    4. Key Vault overview
    5. Hands-on: Lab: Configure RBAC and validate secure access through role assignment
  • Module 5: Querying Data Lake (Starter Options)
    1. Query choices on Azure (Databricks queries, Synapse serverless SQL overview)
    2. When you need a warehouse vs lake queries
    3. Hands-on: Demo Lab: Query raw files using Databricks (preview)
  • Module 6: Azure Data Factory Fundamentals
    1. What ADF solves (ingestion automation, scheduling, dependency management)
    2. Key components (linked services, datasets, pipelines, triggers)
    3. Activity types (Copy Activity, Validation activity intro, ForEach intro)
    4. Hands-on: Lab: Create an ADF instance and explore the UI
  • Module 7: Copy Activity Deep Dive (Core Ingestion Skill)
    1. Copy from HTTP source
    2. Copy from Blob/ADLS source
    3. Copy from SQL database (concept)
    4. Copy into ADLS raw zone
    5. Handling file formats (CSV delimiter issues, schema mapping basics)
    6. Hands-on: Lab: Build ingestion pipeline (source → ADLS raw) + parameterize dataset path with dynamic folders
  • Module 8: Incremental Loads and Watermark Strategy in ADF
    1. Full load vs incremental load
    2. Watermark concepts (updated_at, ingestion timestamp)
    3. Using pipeline parameters for incremental runs
    4. Hands-on: Lab: Build incremental ingestion pipeline using watermark parameter + validate across runs
  • Module 9: Pipeline Monitoring and Operations
    1. Monitor pipeline runs
    2. Activity logs and failure reasons
    3. Retry policies and timeout settings
    4. Alerts overview
    5. Hands-on: Lab: Simulate pipeline failure, troubleshoot, and add retry + failure handling strategy
  • Module 10: Integrating ADF with Key Vault (Secrets Handling)
    1. Why secrets should never be hardcoded
    2. Key Vault linked service
    3. Using secrets in linked services securely
    4. Hands-on: Lab: Connect ADF to Key Vault for secret management
  • Module 11: Databricks Fundamentals for Data Engineering
    1. What is Databricks (Spark-based processing, notebooks, jobs)
    2. Clusters and compute basics
    3. Reading from ADLS securely
    4. Hands-on: Lab: Create Databricks workspace + cluster and read raw zone data into Spark DataFrame
  • Module 12: Transformations with Spark (Practical)
    1. Data cleaning (null handling, schema casting, deduplication)
    2. Joins and aggregations
    3. Writing outputs back to ADLS
    4. Hands-on: Lab: Transform raw orders to cleansed zone + build curated datasets (daily revenue, top customers)
  • Module 13: Delta Lake Fundamentals
    1. Why Delta in a lakehouse (ACID transactions, schema enforcement, scalable merges)
    2. Delta operations (overwrite vs append)
    3. Merge/upsert concept
    4. Time travel introduction
    5. Hands-on: Lab: Write curated data as Delta + run merge/upsert workflow
  • Module 14: Databricks Jobs and Scheduling
    1. Convert notebook into job
    2. Job parameters
    3. Failure and retry strategy
    4. Hands-on: Lab: Schedule a Databricks job and validate execution logs
  • Module 15: Data Quality Checks in Databricks
    1. Validation rules (null checks, row counts, duplicate checks)
    2. Fail fast strategy
    3. Hands-on: Lab: Add validation and stop pipeline on quality failure
  • Module 16: Azure Synapse Fundamentals
    1. Synapse components overview (SQL pools, Spark pools concept, pipelines concept)
    2. Serverless SQL vs Dedicated SQL (when to use each)
    3. Hands-on: Lab: Access Synapse workspace and explore studio
  • Module 17: Querying Lake Data via Synapse Serverless SQL
    1. External tables concept
    2. Query Parquet/Delta datasets
    3. Creating views for BI reporting
    4. Hands-on: Lab: Query curated zone + build analytics views (revenue by region, top customers)
  • Module 18: Data Warehouse Modeling Basics on Azure
    1. Star schema basics (fact + dimension)
    2. KPI reporting dataset design
    3. Transformations for warehouse tables
    4. Hands-on: Workshop: Design fact/dimension tables for e-commerce analytics
  • Module 19: ADF + Databricks + Synapse Integrated Flow
    1. End-to-end architecture (ingest with ADF, transform with Databricks, serve via Synapse)
    2. Automation and reliability approach
    3. Hands-on: Lab: Create orchestrated pipeline concept (ADF triggers Databricks job, curated data queryable in Synapse)
  • Module 20: Performance and Cost Awareness (Starter)
    1. Databricks cluster cost drivers
    2. Synapse query cost awareness
    3. Storage cost optimization basics
    4. Hands-on: Activity: Create cost checklist for running pipelines daily
  • Module 21: Security and Governance for Azure Data Platforms
    1. RBAC for storage and compute
    2. Managed identities in pipelines
    3. Key Vault best practices
    4. Purview overview (lineage, cataloging)
    5. Hands-on: Workshop: Define governance model for data lake access and ownership
  • Module 22: Observability and Reliability in Data Engineering
    1. Monitoring across services (ADF logs, Databricks job logs, storage metrics)
    2. SLA and freshness tracking
    3. Alerting and incident readiness
    4. Hands-on: Lab: Add monitoring plan, validation checkpoints, and create daily pipeline execution report
  • Module 23: Production Best Practices
    1. Idempotent pipeline design
    2. Backfill strategies
    3. Handling schema drift safely
    4. Environment separation (dev, stage, prod)
    5. Hands-on: Activity: Build production readiness checklist for Azure pipelines
  • Module 24: Capstone Project: End-to-End Azure Data Engineering Pipeline
    1. Build ADLS zones (raw/cleansed/curated)
    2. ADF pipeline for ingestion
    3. Databricks transformation pipeline (Delta output)
    4. Synapse analytics queries/views
    5. Data quality checks + incremental load logic
    6. Monitoring strategy and documentation
    7. Hands-on: Capstone deliverables (architecture diagram, notebooks, curated datasets, KPI queries, documentation)

Hands-on Experience with Tools

No items found.
No items found.
No items found.

Training Delivery Format

Flexible, comprehensive training designed to fit your schedule and learning preferences
Opt-in Certifications
AWS, Scrum.org, DASA & more
100% Live
on-site/online training
Hands-on
Labs and capstone projects
Lifetime Access
to training material and sessions

How Does Personalised Training Work?

Skill-Gap Assessment

Analysing skill gap and assessing business requirements to craft a unique program

1

Personalisation

Customising curriculum and projects to prepare your team for challenges within your industry

2

Implementation

Supplementing training with consulting support to ensure implementation in real projects

3

Why Data Engineering on Azure for your business?

  • Enterprise-ready data platforms: Build governed pipelines using Azure-native security and identity controls.
  • Faster integration with Microsoft ecosystem: Strong alignment with Power BI, Synapse, and ADLS.
  • Improved operational resilience: Use managed services for stable scaling and reliability.
  • Better compliance readiness: Azure governance tools support regulated and enterprise workloads.
  • Accelerated analytics delivery: Ship dashboards and insights faster with integrated cloud services.

Lead the Digital Landscape with Cutting-Edge Tech and In-House " Techsperts "

Discover the power of digital transformation with train-to-deliver programs from Uptut's experts. Backed by 50,000+ professionals across the world's leading tech innovators.

Frequently Asked Questions

1. What are the pre-requisites for this training?
Faq PlusFaq Minus

The training does not require you to have prior skills or experience. The curriculum covers basics and progresses towards advanced topics.

2. Will my team get any practical experience with this training?
Faq PlusFaq Minus

With our focus on experiential learning, we have made the training as hands-on as possible with assignments, quizzes and capstone projects, and a lab where trainees will learn by doing tasks live.

3. What is your mode of delivery - online or on-site?
Faq PlusFaq Minus

We conduct both online and on-site training sessions. You can choose any according to the convenience of your team.

4. Will trainees get certified?
Faq PlusFaq Minus

Yes, all trainees will get certificates issued by Uptut under the guidance of industry experts.

5. What do we do if we need further support after the training?
Faq PlusFaq Minus

We have an incredible team of mentors that are available for consultations in case your team needs further assistance. Our experienced team of mentors is ready to guide your team and resolve their queries to utilize the training in the best possible way. Just book a consultation to get support.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.