Artificial Intelligence

Production-Grade Computer Vision & Deep Learning

Online

On-site

Hybrid

Production-Grade Computer Vision & Deep Learning

Build a strong foundation in production-grade Computer Vision systems, from model development to scalable deployment and lifecycle operations. Learn how to deliver high-performance CV pipelines with monitoring, observability, MLOps workflows, and reliability practices used in real-world enterprise environments.

Duration:

5 days

Level:

Advanced

Get a Quote

1500+ users onboarded

Who will Benefit from this Training?

AI/ML Engineers building production CV systems
Data Scientists moving into real-time vision and deployment
MLOps Engineers supporting vision workloads
Computer Vision Engineers working on detection, tracking, and segmentation
Platform and DevOps Engineers supporting GPU inference and serving

Training Objectives

Engineer datasets using data-centric AI practices (annotation strategy, active learning, synthetic data) to improve model outcomes.
Build robust training baselines using transfer learning with modern architectures and reproducible pipelines.
Train production-ready object detection models using YOLOv8 and handle hard cases like small-object detection.
Implement real-time video analytics with object tracking and streaming pipeline reliability patterns.
Use foundation models (SAM, CLIP) for segmentation and zero-shot workflows with minimal training data.
Build OCR and document intelligence solutions using transformer-based OCR approaches.
Optimize models using quantization and acceleration techniques (ONNX, TensorRT) and measure latency/FPS improvements.
Deploy scalable inference systems using Triton Inference Server with gRPC and dynamic batching.
Apply CV MLOps practices including dataset versioning, experiment tracking, and production monitoring for drift/outliers.
Deliver an end-to-end production-style capstone pipeline from data ingestion to deployment and observability.

Build a high-performing, job-ready tech team.

Personalise your team’s upskilling roadmap and design a befitting, hands-on training program with Uptut

get started

Key training modules

Comprehensive, hands-on modules designed to take you from basics to advanced concepts

Module 1: The Production Vision Landscape
1. Architecture evolution: ResNet to EfficientNet to Transformers (ViT)
2. Data-centric shift: why data quality drives model performance
3. Lab: Rapid Transfer Learning with TIMM to build a baseline in under 15 lines of code
Module 2: Advanced Data Engineering
1. Annotation strategy using CVAT or Label Studio
2. Active learning and uncertainty sampling for labeling efficiency
3. Synthetic data generation using Stable Diffusion or Copy-Paste augmentation
4. Lab: Semi-automated labeling workflow with pre-trained model label suggestions in CVAT
Module 3: Production Augmentation
1. Albumentations deep dive for robust augmentation
2. Domain-specific simulation: rain/fog, ISO noise, motion blur
3. Lab: Build an augmentation pipeline that mimics real-world camera degradation
Module 4: CNN Deep Dive and Debugging
1. Visual debugging with Grad-CAM and activation interpretation
2. Failure analysis: background bias and dataset leakage patterns
3. Lab: Train ResNet50 for retail classification and debug model errors using Grad-CAM
Module 5: Modern Object Detection (YOLOv8)
1. YOLOv8 architecture: anchor-free detection, C2f modules, loss functions
2. YOLOv8 vs Faster-RCNN vs RetinaNet tradeoffs
3. Lab: Train YOLOv8 on a custom manufacturing defect dataset
Module 6: The Small Object Problem
1. Detecting tiny defects in high-resolution (4K) imagery
2. SAHI (Slicing Aided Hyper Inference) and tiling strategies
3. Lab: Implement inference tiling to detect defects missed by standard YOLO inference
Module 7: Video Analytics and Object Tracking
1. Detection vs tracking, occlusions, re-identification concepts
2. Tracking algorithms: SORT, DeepSORT, ByteTrack
3. Lab: Build a people counting system using YOLOv8 + ByteTrack
Module 8: Handling Video Streams
1. RTSP streams, buffering, lag, dropped frame handling
2. Geofencing and counting lines using shapely and OpenCV
3. Lab: Build a real-time intrusion detection pipeline on a mock video feed
Module 9: Segmentation and SAM
1. Semantic vs instance segmentation (U-Net vs Mask R-CNN)
2. Segment Anything Model (SAM) for zero-shot segmentation
3. Lab: Use SAM to auto-generate segmentation masks for a dataset
Module 10: Zero-Shot Learning with CLIP
1. Concept of contrastive language-image pre-training
2. Natural language search use cases (catalog search without training)
3. Lab: Build a natural language image search engine using CLIP
Module 11: Vision Transformers (ViT and DETR)
1. ViT architecture: patch embeddings and attention
2. DETR: transformer-based end-to-end object detection
3. Lab: Fine-tune a ViT using HuggingFace transformers for classification
Module 12: OCR and Document Intelligence
1. Transformer OCR intro: TrOCR, PaddleOCR
2. Layout parsing for receipts/invoices using Donut concepts
3. Lab: Extract structured data from shipping labels using PaddleOCR
Module 13: Model Optimization
1. Quantization: FP32 to FP16 to INT8 tradeoffs
2. Pruning fundamentals
3. Lab: Post-training quantization of YOLO and FPS benchmarking
Module 14: Hardware Acceleration (TensorRT and ONNX)
1. Understanding ONNX computational graphs
2. TensorRT optimizations: fusion and kernel tuning
3. Lab: Convert PyTorch model to ONNX and build a TensorRT engine
Module 15: Enterprise Serving (Triton)
1. Limitations of simple API serving for production concurrency
2. Triton server setup, dynamic batching, multi-model execution
3. Lab: Deploy TensorRT model on Triton and query via gRPC
Module 16: Edge Deployment (Jetson and DeepStream)
1. Edge constraints: thermal, memory, resource limits
2. DeepStream SDK intro and GStreamer pipeline concepts
3. Lab: Simulate an edge deployment workflow using remote container deployment
Module 17: CV MLOps
1. Data versioning with DVC for large image datasets
2. Experiment tracking with MLflow
3. Lab: Create a reproducible DVC + MLflow training workflow
Module 18: Monitoring and Observability
1. Embedding drift vs pixel drift concepts
2. Outlier detection using Deepchecks or Alibi Detect
3. Lab: Implement an anomaly alert pipeline for out-of-distribution images
Module 19: Capstone Project (Choose One)
1. End-to-end pipeline: Data pipeline to training to optimization to deployment
2. Option A: Manufacturing visual inspection (YOLOv8 + SAHI + TensorRT + drift detection)
3. Option B: Retail smart shelf (detection + CLIP + Triton serving)
4. Option C: Security perimeter breach (YOLOv8 + ByteTrack + geofencing + video streams)

Hands-on Experience with Tools

No items found.

Training Delivery Format

Flexible, comprehensive training designed to fit your schedule and learning preferences

Opt-in Certifications

AWS, Scrum.org, DASA & more

100% Live

on-site/online training

Hands-on

Labs and capstone projects

Lifetime Access

to training material and sessions

How Does Personalised Training Work?

get started

Skill-Gap Assessment

Analysing skill gap and assessing business requirements to craft a unique program

1

Personalisation

Customising curriculum and projects to prepare your team for challenges within your industry

2

Implementation

Supplementing training with consulting support to ensure implementation in real projects

3

Why Computer Vision for your business?

Reduce operational costs: Automate inspection, monitoring, and compliance workflows with reliable CV pipelines.
Improve quality and safety: Detect defects, anomalies, and security breaches faster than manual methods.
Faster AI adoption: Use transfer learning and foundation models (SAM, CLIP) to deliver results with less data.
Production readiness: Deploy optimized, observable CV systems with real-time performance at scale.
Future-proof capability: Build internal expertise across modern CV stacks, Transformers, and deployment acceleration.

Lead the Digital Landscape with Cutting-Edge Tech and In-House " Techsperts "

Discover the power of digital transformation with train-to-deliver programs from Uptut's experts. Backed by 50,000+ professionals across the world's leading tech innovators.

GET STARTED

Frequently Asked Questions

1. What are the pre-requisites for this training?

The training does not require you to have prior skills or experience. The curriculum covers basics and progresses towards advanced topics.

2. Will my team get any practical experience with this training?

With our focus on experiential learning, we have made the training as hands-on as possible with assignments, quizzes and capstone projects, and a lab where trainees will learn by doing tasks live.

3. What is your mode of delivery - online or on-site?

We conduct both online and on-site training sessions. You can choose any according to the convenience of your team.

4. Will trainees get certified?

Yes, all trainees will get certificates issued by Uptut under the guidance of industry experts.

5. What do we do if we need further support after the training?

We have an incredible team of mentors that are available for consultations in case your team needs further assistance. Our experienced team of mentors is ready to guide your team and resolve their queries to utilize the training in the best possible way. Just book a consultation to get support.

Production-Grade Computer Vision & Deep Learning

Who will Benefit from this Training?

Training Objectives

Build a high-performing, job-ready tech team.

Key training modules

Hands-on Experience with Tools