Data Engineering

Data Engineering on Google Cloud (GCP)

Online

On-site

Hybrid

Data Engineering on Google Cloud (GCP)

Build a strong foundation in Google Cloud Data Engineering, from data lake storage to batch and streaming delivery at scale. Learn how to build secure and observable pipelines using BigQuery, Composer, Pub/Sub, Dataflow, and governance practices aligned with real enterprise workloads.

Duration:

5 days

Level:

Intermediate

Get a Quote

1500+ users onboarded

Who will Benefit from this Training?

Data Engineers
Analytics Engineers
Cloud Engineers supporting data platforms
Data Platform Engineers
DevOps engineers supporting GCP data services
BI engineers transitioning into data engineering

Training Objectives

Understand modern data engineering architectures on GCP and map services to batch and streaming workloads.
Build batch and streaming data pipelines using GCP-native services.
Design a scalable data lake on Cloud Storage (GCS) with raw/cleansed/curated zones.
Build analytics-ready warehouse datasets using BigQuery with partitioning and clustering.
Implement orchestration using Cloud Composer (Managed Airflow) with scheduling, monitoring, retries, and backfills.
Build streaming pipelines using Pub/Sub and Dataflow (Apache Beam concepts) landing into BigQuery.
Apply reliability best practices including idempotency, retries, backfills, and schema evolution handling.
Implement data governance and security using IAM, service accounts, encryption, and access controls.
Implement monitoring and observability practices for BigQuery, Dataflow, and Airflow pipelines.
Deliver an end-to-end capstone pipeline that combines batch and streaming with validation and monitoring.

Build a high-performing, job-ready tech team.

Personalise your team’s upskilling roadmap and design a befitting, hands-on training program with Uptut

get started

Key training modules

Comprehensive, hands-on modules designed to take you from basics to advanced concepts

Module 1: GCP Data Engineering Architecture (Batch and Streaming Mapping)
1. Modern GCP data platform reference architectures (lake + warehouse + streaming)
2. Batch vs streaming workloads and selection criteria
3. Service mapping overview (GCS, BigQuery, Pub/Sub, Dataflow, Composer)
4. End-to-end lifecycle (ingest, store, transform, orchestrate, serve)
5. Hands-on: Activity: Design a GCP reference architecture for analytics + real-time events
Module 2: Data Lake Design on GCS (Raw, Cleansed, Curated Zones)
1. GCS fundamentals for data lakes (buckets, prefixes, lifecycle basics)
2. Zone-based storage layout (raw, cleansed, curated)
3. Partition folder strategy (date, region, source) and naming conventions
4. Columnar formats (Parquet) and small files considerations
5. Hands-on: Lab: Create GCS lake zones and upload datasets with partitioned folder structure
Module 3: BigQuery Fundamentals (Analytics Warehouse Foundations)
1. BigQuery concepts (datasets, tables, views, costs model basics)
2. Partitioning strategies (ingestion time vs column-based partitions)
3. Clustering for performance improvements
4. Warehouse modeling basics (facts/dimensions, analytics views)
5. Hands-on: Lab: Create BigQuery datasets and build partitioned + clustered tables for analytics
Module 4: Batch Pipelines on GCP (Ingestion + Transform into BigQuery)
1. Batch ingestion patterns (file drops to GCS, scheduled loads)
2. Loading data into BigQuery (load jobs, schema mapping concepts)
3. Transformations using BigQuery SQL (CTEs, windows, incremental patterns overview)
4. Publishing curated datasets and marts
5. Hands-on: Lab: Build a batch pipeline from GCS raw → BigQuery staging → curated tables
Module 5: Orchestration with Cloud Composer (Managed Airflow)
1. Composer basics (DAGs, tasks, scheduling)
2. Retries, timeouts, and failure handling patterns
3. Backfills and catchup strategy for batch pipelines
4. Monitoring and logs for Airflow operations
5. Hands-on: Lab: Build a Composer DAG to orchestrate batch ingestion and BigQuery transforms with retries
Module 6: Pub/Sub Fundamentals (Event-Driven Streaming Backbone)
1. Pub/Sub concepts (topics, subscriptions, ack, delivery semantics)
2. Designing event schemas and versioning strategy
3. Ordering keys and throughput considerations (concept)
4. Dead-letter patterns and failure isolation concepts
5. Hands-on: Lab: Publish events to Pub/Sub and validate subscription consumption behavior
Module 7: Streaming Pipelines with Dataflow (Apache Beam Concepts)
1. Dataflow and Beam concepts (pipelines, transforms, windows concept)
2. Streaming pipeline pattern (Pub/Sub → Dataflow → BigQuery)
3. Handling late data and watermark awareness (concept)
4. Streaming-to-warehouse design (deduplication, idempotent writes)
5. Hands-on: Lab: Build a streaming Dataflow pipeline from Pub/Sub to partitioned BigQuery tables
Module 8: Reliability Best Practices (Idempotency, Retries, Backfills, Schema Evolution)
1. Idempotency patterns for batch loads and streaming writes
2. Retries and backoff design (what to retry vs what not to retry)
3. Backfill strategy for missed days or reprocessing
4. Schema evolution handling (new columns, type changes, compatibility)
5. Hands-on: Lab: Implement dedup/idempotency logic and simulate a schema change safely
Module 9: Governance and Security (IAM, Service Accounts, Encryption, Access Controls)
1. IAM fundamentals for data platforms (least privilege access)
2. Service accounts and workload identity patterns (concept)
3. Encryption concepts (at rest and in transit) and key management awareness
4. Access controls for BigQuery datasets and GCS buckets
5. Hands-on: Lab: Configure least-privilege access for pipelines and validate restricted permissions
Module 10: Monitoring and Observability (BigQuery, Dataflow, Airflow)
1. What to monitor (latency, errors, throughput, freshness)
2. Airflow observability (DAG run status, retries, logs)
3. Dataflow monitoring concepts (job health, backlogs, worker behavior)
4. BigQuery monitoring concepts (slot usage, query costs, load errors)
5. Hands-on: Lab: Create monitoring checklist + alerts for pipeline failures and data freshness
Module 11: Capstone Project (End-to-End Batch + Streaming Pipeline)
1. Capstone goal: Deliver an end-to-end GCP pipeline combining batch and streaming
2. Batch: GCS zones → BigQuery staging → curated tables with validation
3. Streaming: Pub/Sub → Dataflow → BigQuery with dedup/idempotency
4. Orchestrate: Cloud Composer DAG with retries, monitoring, backfills
5. Security + observability: IAM/service accounts + monitoring and runbook notes
6. Hands-on: Capstone Lab: Deliver architecture diagram, pipeline code, DAG, validation results, and monitoring evidence

Hands-on Experience with Tools

No items found.

Training Delivery Format

Flexible, comprehensive training designed to fit your schedule and learning preferences

Opt-in Certifications

AWS, Scrum.org, DASA & more

100% Live

on-site/online training

Hands-on

Labs and capstone projects

Lifetime Access

to training material and sessions

How Does Personalised Training Work?

get started

Skill-Gap Assessment

Analysing skill gap and assessing business requirements to craft a unique program

1

Personalisation

Customising curriculum and projects to prepare your team for challenges within your industry

2

Implementation

Supplementing training with consulting support to ensure implementation in real projects

3

Why Data Engineering on GCP for your business?

Faster analytics at scale: BigQuery enables high-performance querying without heavy infrastructure management.
Improved data streaming capability: Use Pub/Sub and Dataflow for real-time pipelines and event processing.
Lower operational complexity: Managed services simplify cluster and pipeline maintenance.
Better AI integration: GCP aligns strongly with ML workflows using Vertex AI and data services.
Scalable business intelligence: Enable self-service analytics with secure and centralized datasets.

Lead the Digital Landscape with Cutting-Edge Tech and In-House " Techsperts "

Discover the power of digital transformation with train-to-deliver programs from Uptut's experts. Backed by 50,000+ professionals across the world's leading tech innovators.

GET STARTED

Frequently Asked Questions

1. What are the pre-requisites for this training?

The training does not require you to have prior skills or experience. The curriculum covers basics and progresses towards advanced topics.

2. Will my team get any practical experience with this training?

With our focus on experiential learning, we have made the training as hands-on as possible with assignments, quizzes and capstone projects, and a lab where trainees will learn by doing tasks live.

3. What is your mode of delivery - online or on-site?

We conduct both online and on-site training sessions. You can choose any according to the convenience of your team.

4. Will trainees get certified?

Yes, all trainees will get certificates issued by Uptut under the guidance of industry experts.

5. What do we do if we need further support after the training?

We have an incredible team of mentors that are available for consultations in case your team needs further assistance. Our experienced team of mentors is ready to guide your team and resolve their queries to utilize the training in the best possible way. Just book a consultation to get support.

Data Engineering on Google Cloud (GCP)

Who will Benefit from this Training?

Training Objectives

Build a high-performing, job-ready tech team.

Key training modules

Hands-on Experience with Tools