Force7 Training
FRCAWS-12AWS

Data Engineering on AWS

Duration · 3 daysVirtual + In-PersonInstructor-Led

Course Description

This intensive 3-day instructor-led course teaches students how to design, build, secure, automate, and optimize modern data engineering solutions using Amazon Web Services (AWS). Participants learn how to ingest, transform, store, process, analyze, and operationalize data pipelines using AWS-native services for batch, streaming, and real-time analytics workloads.

The course combines instructor presentations, demonstrations, architecture discussions, and hands-on labs to provide practical experience building scalable cloud-based data platforms.

— Be First in Line —

Register Your Interest

We're finalizing the schedule for Data Engineering on AWS. Add your details below and we'll notify you the moment a session opens for registration — no payment or commitment required.

Audience Profile

This course is intended for:

  • Data Engineers
  • Data Analysts
  • Cloud Engineers
  • Solutions Architects
  • Database Administrators
  • DevOps Engineers
  • Software Developers
  • Technical Managers

Prerequisites

Before enrolling, you should have:

  • Basic understanding of databases and SQL
  • Familiarity with cloud computing concepts
  • Basic scripting or programming experience
  • General knowledge of data analytics concepts
  • AWS foundational knowledge recommended

— What You'll Learn —

Learning Objectives

In this course, you will learn to:

  • 1Design modern data architectures on AWS
  • 2Build scalable batch and streaming data pipelines
  • 3Implement ETL and ELT workflows
  • 4Store and manage structured and unstructured data
  • 5Perform real-time and big data analytics
  • 6Automate and orchestrate data workflows
  • 7Secure and govern enterprise data environments
  • 8Monitor and optimize data engineering workloads
  • 9Implement data lake and data warehouse solutions

— Day-by-Day —

Course Outline

Day 1 — Data Engineering Foundations and Data Storage

Module 1

Introduction to Data Engineering on AWS

Topics

  • Overview of modern data engineering
  • Data lifecycle and pipeline concepts
  • Batch vs streaming architectures
  • AWS data engineering ecosystem overview
  • Designing cloud-native data platforms
  • AWS Services Covered
  • Amazon S3
  • AWS IAM
  • Amazon EC2
  • AWS Lambda

Lab

  • Configure AWS data engineering environment
  • Create secure storage and access policies
  • Explore AWS analytics services

Module 2

Data Storage and Data Lakes

Topics

  • Data lake architecture principles
  • Structured vs semi-structured vs unstructured data
  • Storage optimization strategies
  • Partitioning and compression
  • Metadata management
  • AWS Services Covered
  • Amazon S3
  • AWS Lake Formation
  • AWS Glue Data Catalog

Lab

  • Build a centralized data lake
  • Configure metadata cataloging
  • Implement data partitioning

Module 3

Relational and Analytical Databases

Topics

  • Operational databases vs analytical databases
  • Data warehouse concepts
  • OLTP vs OLAP workloads
  • Database scalability and performance
  • Query optimization fundamentals
  • AWS Services Covered
  • Amazon RDS
  • Amazon Aurora
  • Amazon Redshift

Lab

  • Deploy relational databases
  • Load data into Amazon Redshift
  • Run analytical queries

Module 4

Data Ingestion and ETL Fundamentals

Topics

  • Data ingestion architectures
  • ETL vs ELT processing
  • Schema evolution
  • Data transformation best practices
  • Building resilient ingestion pipelines
  • AWS Services Covered
  • AWS Glue
  • AWS DataSync
  • AWS Database Migration Service (DMS)

Lab

  • Build ETL workflows with AWS Glue
  • Migrate data into AWS
  • Automate ingestion pipelines

Day 2 — Big Data Processing and Streaming Analytics

Module 5

Big Data Processing on AWS

Topics

  • Distributed data processing concepts
  • Hadoop and Spark fundamentals
  • Cluster management
  • Scalable data transformation workflows
  • Performance tuning strategies
  • AWS Services Covered
  • Amazon EMR
  • Apache Spark on EMR
  • Amazon EC2

Lab

  • Launch EMR clusters
  • Process large datasets using Spark
  • Optimize distributed processing jobs

Module 6

Streaming Data Architectures

Topics

  • Real-time data ingestion concepts
  • Event-driven architectures
  • Streaming analytics patterns
  • Low-latency processing
  • Designing fault-tolerant streams
  • AWS Services Covered
  • Amazon Kinesis Data Streams
  • Amazon Kinesis Data Firehose
  • Amazon Managed Service for Apache Flink

Lab

  • Build real-time streaming pipelines
  • Process live event streams
  • Deliver streaming data into analytics platforms

Module 7

Data Querying and Analytics

Topics

  • Serverless analytics
  • Interactive querying
  • Data federation concepts
  • Business intelligence integrations
  • Data visualization strategies
  • AWS Services Covered
  • Amazon Athena
  • Amazon QuickSight
  • Amazon OpenSearch Service

Lab

  • Query data lake datasets
  • Build dashboards and visualizations
  • Analyze operational metrics

Module 8

Workflow Automation and Orchestration

Topics

  • Data pipeline orchestration
  • Event-driven automation
  • Scheduling and dependency management
  • Error handling and retries
  • Infrastructure as Code fundamentals
  • AWS Services Covered
  • AWS Step Functions
  • Amazon EventBridge
  • AWS CloudFormation

Lab

  • Automate end-to-end data workflows
  • Build event-driven processing pipelines
  • Deploy infrastructure templates

Day 3 — Security, Governance, Optimization, and Capstone

Module 9

Data Security and Governance

Topics

  • Data protection strategies
  • Encryption at rest and in transit
  • Identity and access management
  • Data governance frameworks
  • Compliance and auditing
  • AWS Services Covered
  • AWS IAM
  • AWS KMS
  • AWS CloudTrail
  • AWS Lake Formation

Lab

  • Configure fine-grained data permissions
  • Implement encryption policies
  • Audit access activity

Module 10

Monitoring and Performance Optimization

Topics

  • Monitoring data pipelines
  • Logging and observability
  • Cost optimization strategies
  • Performance tuning for analytics workloads
  • High availability and disaster recovery
  • AWS Services Covered
  • Amazon CloudWatch
  • AWS Trusted Advisor
  • AWS Cost Explorer

Lab

  • Monitor pipeline performance
  • Configure alerts and dashboards
  • Optimize analytics costs

Module 11

Modern Data Architecture Best Practices

Topics

  • Lakehouse architectures
  • Data mesh concepts
  • Multi-account data environments
  • Hybrid and multi-cloud integration
  • Designing enterprise-scale data platforms
  • Group Workshop
  • Review enterprise architecture case studies
  • Evaluate scalability patterns
  • Discuss operational best practices

Module 12

Capstone Project

  • Student Project
  • Students design and implement a complete end-to-end data engineering solution on AWS.
  • Capstone Activities
  • Build a cloud-native data lake
  • Create ETL and streaming pipelines
  • Process and analyze large datasets
  • Secure and monitor the environment
  • Present architecture and operational strategy
  • Included Hands-On Labs
  • Students complete guided labs covering:
  • Data lake creation
  • ETL pipeline development
  • Database migration
  • Streaming data ingestion
  • Spark processing on EMR
  • Real-time analytics
  • Query optimization
  • Workflow automation
  • Security and governance
  • Monitoring and cost optimization

— Additional Details —

What else is included

Suggested Course Materials

  • Student guide
  • Instructor presentation slides
  • Hands-on lab manual
  • AWS architecture diagrams
  • Sample datasets
  • Capstone workbook

Note: Course outlines are provided as a general guide. Content, pacing, labs, and instructional emphasis may vary based on instructor expertise, student experience levels, and customer-specific learning objectives.

— Keep Exploring —

Need a different angle?

Browse the full AWS catalog or chat with an advisor about a custom training plan for your team.