Detailed training (Level 350) on AWS Data Engineering Services Redshift, S3, Athena, Hive, Glue Catalog, Lakeformation
This is Volume 1 of Data Engineering course on AWS. This course will give you detailed explanations on AWS Data Engineering Services like S3 (Simple Storage Service), Redshift, Athena, Hive, Glue Data Catalog, Lake Formation. This course delves into the data warehouse or consumption and storage layer of Data Engineering pipeline. In Volume 2, I will showcase Data Processing (Batch and Streaming) Services.
You will get opportunities to do hands-on using large datasets (100 GB – 300 GB or more of data). Moreover, this course will provide you hands-on exercises that match with real-time scenarios like Redshift query performance tuning, streaming ingestion, Window functions, ACID transactions, COPY command, Distributed & Sort key, WLM, Row level and column level security, Athena partitioning, Athena WLM etc.
Some other highlights:
-
Contains training of data modelling – Normalization & ER Diagram for OLTP systems. Dimensional modelling for OLAP/DWH systems.
-
Data modelling hands-on.
-
Other technologies covered – EC2, EBS, VPC and IAM.
This is Part 1 (Volume 1) of the full data engineering course. In Part 2 (Volume 2), I will be covering the following Topics.
-
Spark (Batch and Stream processing using AWS EMR, AWS Glue ETL, GCP Dataproc)
-
Kafka (on AWS & GCP)
-
Flink
-
Apache Airflow
-
Apache Pinot
-
AWS Kinesis and more.
What Will You Learn?
- Understand Data Engineering (Volume 1) on AWS using S3, Redshift, Athena and Hive
- Know Redshift, S3 and Athena up to Level 350+ with HANDS-ON
- Production level projects and hands-on to help candidates provide on-job-like training
- Get access to datasets of size 100 GB - 200 GB and practice using the same
Requirements
- Good to have AWS and SQL knowledge
Audience
- Data Engineers, Data Scientists, Data Analysts
- Python developers, Application Developers, Big Data Developers
- Database Administrators (DBA), Big Data Administrators
- Solutions Architect, Cloud Architect, Big Data Architect
- Technical Managers, Engineering Managers, Project Managers
Course Content
Introduction – Data Engineering Volume 1 on AWS
(Optional) AWS Pre-requisites – EC2 & EBS
-
AWS Cloud and EC2 Introduction
19:13 -
EC2 Components & HandsOn 1
21:07 -
EC2 Handson 2
11:26 -
EBS Theory
19:33 -
EBS HandsOn
13:25
(Optional) AWS Pre-requisites – VPC
-
VPC Introduction & Components
18:36 -
VPC Components Hands On
23:04 -
Bastion Host
05:29 -
Security Groups
15:09 -
NAT Gateway & VPC Endpoint
20:13 -
VPC Peering
02:33
(Optional) AWS Pre-requisites – IAM
-
IAM Introduction & Hands On
26:01 -
IAM Service Roles
19:38
(Optional) AWS Pre-requisites – SQL Basics
-
SQL Introduction
30:35 -
SQL Client & Server Setup
12:18 -
SQL Database Objects Theory
28:13 -
Database Objects Hands On
29:19 -
CRUD Operations
21:18 -
SELECT Operators
24:41 -
CASE COALESCE Functions
13:19 -
DATE Functions
05:46 -
CTAS Cast Concat
14:11 -
Update Delete Truncate
12:31 -
HAVING Clause
07:43 -
Inner Join, Left Join, Right Join, Outer Join
19:27 -
Union Intersect View
17:35 -
Materialized View
08:19 -
Common Table Expression (CTE)
10:48 -
SQL Window Functions
22:40 -
MERGE statement & Summary
10:52
(Optional) AWS Pre-requisites – Python Basics
-
Python Intro – Architecture, PyCharm, Virtual Env
39:33 -
PyCharm & CLI Walkthrough
08:51 -
Compiled vs Interpreted
07:42 -
Everything is Python is Object
12:39 -
String Data Type
10:20 -
Number Data Type
04:02 -
List Data Type
11:36 -
Tuple Data Type
06:16 -
Set & Dict Data Type, Type Conversion
16:03 -
Python Operators & Memory
10:27 -
Set up Python interpreter in PyCharm
11:23 -
Print & Input Functions
16:04 -
IF Statement
14:36 -
For & While loops
15:57 -
Functions Intro
09:54 -
Function Scoping
15:20 -
Functions RETURN
07:53 -
Function Arguments
09:38 -
Modify Arguments
09:07 -
Positional & Keyword Arguments
09:42 -
args & kwargs
16:39 -
Class Object Self
32:49 -
Class-Instance Variables, __init__
17:01 -
Class Object Exercise 1
14:48 -
Class Object Exercise 2
13:48 -
Inheritance
07:50 -
Python Memory Management
08:05 -
Modules & Packages
33:27 -
HandsOn Exercise
01:31 -
Module Pre-compilation
03:28 -
Namespace & __name__
09:56 -
Error Handling in Python
14:00 -
File Handling
17:10 -
CSV & JSON module
13:37 -
Python Multi-threading concept
17:37 -
Multi-threading hands-on and exercise
22:01 -
Debugging & Profiling
18:26
Data Engineering Introduction
-
Data Engineering introduction, OLTP & OLAP
33:10 -
Data Mart & Data Mesh
05:31 -
22:05
AWS Distributed Storage – S3 (Simple, Storage, Service) for Data Engineers
-
Introduction 1
14:38 -
Introduction 2
22:49 -
Basics
05:43 -
Basics Hands-on
18:53 -
Versioning
13:06 -
Encryption
05:51 -
Storage Class
20:18 -
Multipart Upload
12:51 -
Lifecycle Policies
15:04 -
Cross Region Replication
10:13 -
Mountpoint
09:21 -
Security – S3 Identity Based Policy
19:03 -
Security – S3 Bucket Policy
08:29 -
Bucket Policy with VPC, IP address, VPCE
03:50 -
Access Point
16:27 -
Object Lambda
18:55 -
Pre-signed URL
04:33 -
Performance Considerations
05:31 -
Pricings
13:00 -
Architectural Patterns using S3
07:25
Data Modelling – Normalization, ER Diagram, Dimensional Modelling
-
Highlights
13:09 -
Data Modelling Introduction
17:15 -
Normal Forms 1NF 2NF 3NF
28:01 -
Relations: one-to-one, one-to-many, many-to-one, many-to-many
08:51 -
Dimensional modelling – Facts, Dimensions & Grains
24:39 -
Grains Exercise
09:19 -
Dimensional Modelling Technique
15:00 -
Types of Fact & Dimension Tables
10:10 -
Data Virt Semantic Presentation Layers
13:24
Data Warehouse on AWS – Redshift Infra
-
Redshift Infra
19:36 -
Redshift Infra HandsOn
21:42 -
Redshift Architecture – Zone Map, Columnar Storage
15:02 -
Cluster Resize – Elastic & Classic
08:32 -
Cluster Resize – HandsOn
05:03 -
Cluster Pause & Rename
04:31 -
Snapshot & Backup
07:43 -
Redsfhit Infra Conclusion
03:04
Redshifts Objects
-
Querying, Connection, RSQL, QEV2
16:10 -
Query Editor & RSQL setup
17:54 -
Object Hierarcy, tables hands-on
19:14 -
Data Types Hands-on
14:20 -
Table operations Hands-on
12:27 -
Redshift ACID, Locks, Isolation Level
12:05 -
Implement Transactions
10:01 -
AccessShareLock & ShareRowExclusiveLock HandsOn
08:50 -
Redshift SUPER datatype
14:29 -
Section Summary
05:02
Redshift Deep Dive
-
Distribution Key & Style, Sort Key
29:35 -
Column Compression
06:42 -
Modify Dist Sort Key, Compression HandsOn
19:18 -
COPY Command Theory
06:33 -
COPY Command HandsOn
13:36 -
UNLOAD Command
07:16 -
AWS DMS – Move from OLTP to DWH
08:10 -
DMS – Setup Source OLTP & Python application
19:14 -
Setup DMS Instance, Endpoint, Task
15:28 -
DMS Task – OLTP to DWH
28:44 -
Table Maintenance – VACUUM & ANALYZE
09:46 -
Vacuum & Analyze HandsOn
11:25
Resdhift Features
-
Materialized View (MV)
10:27 -
MV HandsOn
13:59 -
Query Federation
11:19 -
Redshift Spectrum
14:18 -
Streaming Ingestion
24:05 -
Redshift Feature Use Cases
28:36
Redshift Query Tuning
-
Query Execution
22:32 -
EXPLAIN Plan & System Joins
23:32 -
System Joins HandsOn
05:29 -
Data RE-distribution
09:01 -
EXPLAIN & RE-distribution HandsOn
16:18 -
Query Tuning Exercise – Part 1
31:17 -
Query Tuning Exercise – Part 2
24:46 -
Query Tuning Exercise – Part 3
24:27
Redshift Workload Management (WLM)
-
WLM Intro & Query Queue
09:02 -
Concurrency Scaling, Short Query Acceleration
07:19 -
Configure WLM HandsOn
16:44 -
Create Query Queue HandsOn
15:50 -
07:10
-
Concurrency Scaling In Action
15:33
Redshift Security- RBAC, CLS, RLS, Dynamic Data Masking (DDM)
-
Users, Roles, RBAC
11:29 -
Users, Roles, RBAC HandsOn
12:14 -
Row & Column Level Security (RLS & CLS)
11:25 -
Multiple RLS Policies HandsOn
13:02 -
CLS HandsOn
05:00 -
Combine RLS & CLS HandsOn
04:44 -
Dynamic Data Masking
07:00 -
Track Users, Roles, CLS and RLS
10:48 -
Audit Logging
11:32
Monitoring in Redshift
-
Monitor Redshift using Console
18:00 -
System Views for Monitoring Queries, Redshift Objects, Configuration Parms
20:50
Reshift Serverless
-
Introduction to Redshift Serverless
12:12 -
Create & Delete Redshift Serverless Resources
06:39 -
COPY & UNLOAD in Serverless
10:17 -
ZeroETL Integration Setup
24:12 -
ZeroETL in Action
12:59 -
Query Tuning Similarities
04:28 -
Migrate from Provisioned to Serverless
03:55
Detailed Redshift Pricing
-
Redshift Pricing Components
11:14 -
Pricing Example – Provisioned, Serverless, Concurrency Scaling, Spectrum
11:15 -
AWS Pricing Calculator
05:17
Redshift Additional Information
-
Redshift Integration with AWS Services
13:01 -
Redshift & Snowflake Comparison
19:46 -
Redshift Best Practices
16:10 -
Redshift Limitations and Challenges
16:28
AWS Metadata Repository – Glue Data Catalog
-
AWS Glue Catalog – Theory
08:36 -
Glue Catalog – Setup Data Stores & IAM Roles HandsOn
08:29 -
Store Aurora metadata in Glue Catalog
17:55 -
Store S3 and Redshift metadata in Glue Catalog
10:50
Data Governance using AWS Lake Formation
-
Lake Formation Introduction
17:56 -
Permission Flow HandsOn 1
19:41 -
Permission Flow HandsOn 2
09:09 -
Lake Formation – Tag Based Access Control (LF-TBAC)
06:16 -
LF-TBAC HandsOn
19:10 -
LF – Data Filtering
11:18 -
LF Clean Up (Please complete this)
03:29
Data Lakehouse on AWS – Athena
-
Athena Introduction
16:41 -
Athena Intro Hands On
10:44 -
Athena SerDe, File & Row format
12:51 -
SerDe, Format, CTAS Hands On
12:52 -
UNLOAD, Prepare & Execute, Query JSON
08:40 -
UNLOAD, Prepare & Execute, Query JSON Hands On
17:27 -
Schema Evolution, JSON_EXTRACT
13:54 -
Iceberg, ACID
26:19 -
Athena Partitioning & Bucketing
19:28 -
More DDL Commands
06:12 -
Athena WLM Theory
08:03 -
Workgroup HandsOn
11:32 -
Capacity Reservation HandsOn
04:06 -
Performance Tuning Theory
13:29 -
Athena Pricing & Performance Tuning
15:46 -
Architectural Patterns using Athena
08:30
Big Data Warehouse – HIVE
-
Hadoop Theory
21:05 -
File Formats
08:58 -
Hive Architecture & Components
16:35 -
Hive CLI
03:21 -
Data Types, databases, tables, File & Row Format, Hive SerDe
12:57 -
Hive Databases hands-on
17:07 -
Hive Tables hands-on
16:40 -
Partitioning & Bucketing
16:55 -
Partitioning & Bucketing hands-on
19:30 -
Load, insert, ACID, Materialized Views etc
15:08 -
JOINs, Locks, Configuration Parameters
14:26
Projects – Redshift, Athena, DataModelling, Pythin (Total Dataset Size – 150 GB)
-
Project 1 & 2 Data Modelling, Python Coding & Redshift SQL (Dataset – 10.4 GB)
07:07 -
Project 3 & 4 – Administration & Move from OLTP to OLAP (Dataset – 1.3 GB)
06:51 -
Project 5 – Redshift Performance Tuning (Dataset – 135 GB)
07:45 -
Project 1 & 2 – PDF
-
Project 3 & 4
-
Project 5
New Features – Redshift, S3 & Athena
-
New Features – Introduction
01:07






