Detailed training (Level 350) on AWS Data Engineering Services Redshift, S3, Athena, Hive, Glue Catalog, Lakeformation
This is Volume 1 of Data Engineering course on AWS. This course will give you detailed explanations on AWS Data Engineering Services like S3 (Simple Storage Service), Redshift, Athena, Hive, Glue Data Catalog, Lake Formation. This course delves into the data warehouse or consumption and storage layer of Data Engineering pipeline. In Volume 2, I will showcase Data Processing (Batch and Streaming) Services.
You will get opportunities to do hands-on using large datasets (100 GB – 300 GB or more of data). Moreover, this course will provide you hands-on exercises that match with real-time scenarios like Redshift query performance tuning, streaming ingestion, Window functions, ACID transactions, COPY command, Distributed & Sort key, WLM, Row level and column level security, Athena partitioning, Athena WLM etc.
Some other highlights:
-
Contains training of data modelling – Normalization & ER Diagram for OLTP systems. Dimensional modelling for OLAP/DWH systems.
-
Data modelling hands-on.
-
Other technologies covered – EC2, EBS, VPC and IAM.
This is Part 1 (Volume 1) of the full data engineering course. In Part 2 (Volume 2), I will be covering the following Topics.
-
Spark (Batch and Stream processing using AWS EMR, AWS Glue ETL, GCP Dataproc)
-
Kafka (on AWS & GCP)
-
Flink
-
Apache Airflow
-
Apache Pinot
-
AWS Kinesis and more.
What Will You Learn?
- Understand Data Engineering (Volume 1) on AWS using S3, Redshift, Athena and Hive
- Know Redshift, S3 and Athena up to Level 350+ with HANDS-ON
- Production level projects and hands-on to help candidates provide on-job-like training
- Get access to datasets of size 100 GB - 200 GB and practice using the same
Requirements
- Good to have AWS and SQL knowledge
Audience
- Data Engineers, Data Scientists, Data Analysts
- Python developers, Application Developers, Big Data Developers
- Database Administrators (DBA), Big Data Administrators
- Solutions Architect, Cloud Architect, Big Data Architect
- Technical Managers, Engineering Managers, Project Managers
Course Content
Introduction – Data Engineering Volume 1 on AWS
Data Engineering Introduction
-
Data Engineering introduction, OLTP & OLAP
33:10 -
Data Mart & Data Mesh
05:31 -
Data Lake, Data Lakehouse, DWH
00:00
AWS Distributed Storage – S3 (Simple, Storage, Service) for Data Engineers
-
Introduction 1
14:38 -
Introduction 2
22:49 -
Basics
05:43 -
Basics Hands-on
18:53 -
Versioning
13:06 -
Encryption
05:51 -
Storage Class
20:18 -
Multipart Upload
12:51 -
Lifecycle Policies
15:04 -
Cross Region Replication
10:13 -
Mountpoint
09:21 -
Security – S3 Identity Based Policy
19:03 -
Security – S3 Bucket Policy
08:29 -
Bucket Policy with VPC, IP address, VPCE
03:50 -
Access Point
16:27 -
Object Lambda
18:55 -
Pre-signed URL
04:33 -
Performance Considerations
05:31 -
Pricings
13:00 -
Architectural Patterns using S3
07:25
Data Modelling – Normalization, ER Diagram, Dimensional Modelling
-
Highlights
13:09 -
Data Modelling Introduction
17:15 -
Normal Forms 1NF 2NF 3NF
28:01 -
Relations: one-to-one, one-to-many, many-to-one, many-to-many
08:51 -
Dimensional modelling – Facts, Dimensions & Grains
24:39 -
Grains Exercise
09:19 -
Dimensional Modelling Technique
15:00 -
Types of Fact & Dimension Tables
10:10 -
Data Virt Semantic Presentation Layers
13:24
Data Warehouse on AWS – Redshift Infra
-
Redshift Infra
19:36 -
Redshift Infra HandsOn
21:42 -
Redshift Architecture – Zone Map, Columnar Storage
15:02 -
Cluster Resize – Elastic & Classic
08:32 -
Cluster Resize – HandsOn
05:03 -
Cluster Pause & Rename
04:31 -
Snapshot & Backup
07:43 -
Redsfhit Infra Conclusion
03:04
Redshifts Objects
-
Querying, Connection, RSQL, QEV2
16:10 -
Query Editor & RSQL setup
17:54 -
Object Hierarcy, tables hands-on
19:14 -
Data Types Hands-on
14:20 -
Table operations Hands-on
12:27 -
Redshift ACID, Locks, Isolation Level
12:05 -
Implement Transactions
10:01 -
AccessShareLock & ShareRowExclusiveLock HandsOn
08:50 -
Redshift SUPER datatype
14:29 -
Section Summary
05:02
Redshift Deep Dive
-
Distribution Key & Style, Sort Key
29:35 -
Column Compression
06:42 -
Modify Dist Sort Key, Compression HandsOn
19:18 -
COPY Command Theory
06:33 -
COPY Command HandsOn
13:36 -
UNLOAD Command
07:16 -
AWS DMS – Move from OLTP to DWH
08:10 -
DMS – Setup Source OLTP & Python application
19:14 -
Setup DMS Instance, Endpoint, Task
15:28 -
DMS Task – OLTP to DWH
28:44 -
Table Maintenance – VACUUM & ANALYZE
09:46 -
Vacuum & Analyze HandsOn
11:25
Resdhift Features
-
Materialized View (MV)
10:27 -
MV HandsOn
13:59 -
Query Federation
11:19 -
Redshift Spectrum
14:18 -
Streaming Ingestion
24:05 -
Redshift Feature Use Cases
28:36
Redshift Query Tuning
-
Query Execution
22:32 -
EXPLAIN Plan & System Joins
23:32 -
System Joins HandsOn
05:29 -
Data RE-distribution
09:01 -
EXPLAIN & RE-distribution HandsOn
16:18 -
Query Tuning Exercise – Part 1
31:17 -
Query Tuning Exercise – Part 2
24:46 -
Query Tuning Exercise – Part 3
24:27
Redshift Workload Management (WLM)
-
WLM Intro & Query Queue
09:02 -
Concurrency Scaling, Short Query Acceleration
07:19 -
Configure WLM HandsOn
16:44 -
Create Query Queue HandsOn
15:50 -
Query Queue In Action
00:00 -
Concurrency Scaling In Action
15:33
Redshift Security- RBAC, CLS, RLS, Dynamic Data Masking (DDM)
-
Users, Roles, RBAC
11:29 -
Users, Roles, RBAC HandsOn
12:14 -
Row & Column Level Security (RLS & CLS)
11:25 -
Multiple RLS Policies HandsOn
13:02 -
CLS HandsOn
05:00 -
Combine RLS & CLS HandsOn
04:44 -
Dynamic Data Masking
07:00 -
Track Users, Roles, CLS and RLS
10:48 -
Audit Logging
11:32
Monitoring in Redshift
-
Monitor Redshift using Console
18:00 -
System Views for Monitoring Queries, Redshift Objects, Configuration Parms
20:50
Reshift Serverless
-
Introduction to Redshift Serverless
12:12 -
Create & Delete Redshift Serverless Resources
06:39 -
COPY & UNLOAD in Serverless
-
ZeroETL Integration Setup
24:12 -
ZeroETL in Action
12:59 -
Query Tuning Similarities
-
Migrate from Provisioned to Serverless
Detailed Redshift Pricing
-
Redshift Pricing Components
-
Pricing Example – Provisioned, Serverless, Concurrency Scaling, Spectrum
-
AWS Pricing Calculator
Redshift Additional Information
-
Redshift Integration with AWS Services
-
Redshift & Snowflake Comparison
-
Redshift Best Practices
-
Redshift Limitations and Challenges
AWS Metadata Repository – Glue Data Catalog
-
AWS Glue Catalog – Theory
-
Glue Catalog – Setup Data Stores & IAM Roles HandsOn
-
Store Aurora metadata in Glue Catalog
-
Store S3 and Redshift metadata in Glue Catalog
Data Governance using AWS Lake Formation
-
Lake Formation Introduction
-
Permission Flow HandsOn 1
-
Permission Flow HandsOn 2
-
Lake Formation – Tag Based Access Control (LF-TBAC)
-
LF-TBAC HandsOn
-
LF – Data Filtering
-
LF Clean Up (Please complete this)
Data Lakehouse on AWS – Athena
-
Athena Introduction
16:41 -
Athena Intro Hands On
10:44 -
Athena SerDe, File & Row format
12:51 -
SerDe, Format, CTAS Hands On
12:52 -
UNLOAD, Prepare & Execute, Query JSON
08:40 -
UNLOAD, Prepare & Execute, Query JSON Hands On
17:27 -
Schema Evolution, JSON_EXTRACT
13:54 -
Iceberg, ACID
26:19 -
Athena Partitioning & Bucketing
19:28 -
More DDL Commands
06:12 -
Athena WLM Theory
08:03 -
Workgroup HandsOn
11:32 -
Capacity Reservation HandsOn
04:06 -
Performance Tuning Theory
13:29 -
Athena Pricing & Performance Tuning
15:46 -
Architectural Patterns using Athena
08:30
Big Data Warehouse – HIVE
-
Hadoop Theory
-
File Formats
-
Hive Architecture & Components
-
Hive CLI
-
Data Types, databases, tables, File & Row Format, Hive SerDe
-
Hive Databases hands-on
-
Hive Tables hands-on
-
Partitioning & Bucketing
-
Partitioning & Bucketing hands-on
-
Load, insert, ACID, Materialized Views etc
-
JOINs, Locks, Configuration Parameters
Projects – Redshift, Athena, DataModelling, Pythin (Total Dataset Size – 150 GB)
-
Project 1 & 2 Data Modelling, Python Coding & Redshift SQL (Dataset – 10.4 GB)
-
Project 3 & 4 – Administration & Move from OLTP to OLAP (Dataset – 1.3 GB)
-
Project 5 – Redshift Performance Tuning (Dataset – 135 GB)
New Features – Redshift, S3 & Athena
-
New Features – Introduction





