Data Engineering on AWS Vol 1 – OLAP & Data Warehouse

Detailed training (Level 350) on AWS Data Engineering Services Redshift, S3, Athena, Hive, Glue Catalog, Lakeformation

This is Volume 1 of Data Engineering course on AWS. This course will give you detailed explanations on AWS Data Engineering Services like S3 (Simple Storage Service), Redshift, Athena, Hive, Glue Data Catalog, Lake Formation. This course delves into the data warehouse or consumption and storage layer of Data Engineering pipeline. In Volume 2, I will showcase Data Processing (Batch and Streaming) Services.

You will get opportunities to do hands-on using large datasets (100 GB – 300 GB or more of data). Moreover, this course will provide you hands-on exercises that match with real-time scenarios like Redshift query performance tuning, streaming ingestion, Window functions, ACID transactions, COPY command, Distributed & Sort key, WLM, Row level and column level security, Athena partitioning, Athena WLM etc.

Some other highlights:

Contains training of data modelling – Normalization & ER Diagram for OLTP systems. Dimensional modelling for OLAP/DWH systems.
Data modelling hands-on.
Other technologies covered – EC2, EBS, VPC and IAM.

This is Part 1 (Volume 1) of the full data engineering course. In Part 2 (Volume 2), I will be covering the following Topics.

Spark (Batch and Stream processing using AWS EMR, AWS Glue ETL, GCP Dataproc)
Kafka (on AWS & GCP)
Flink
Apache Airflow
Apache Pinot
AWS Kinesis and more.

Course Content

Introduction – Data Engineering Volume 1 on AWS

Course Introduction and Resources

24:28
Course Introduction and Course Contents

12:11

(Optional) AWS Pre-requisites – EC2 & EBS

(Optional) AWS Pre-requisites – VPC

(Optional) AWS Pre-requisites – IAM

(Optional) AWS Pre-requisites – SQL Basics

SQL Introduction

30:35
SQL Client & Server Setup

12:18
SQL Database Objects Theory

28:13
Database Objects Hands On

29:19
CRUD Operations

21:18
SELECT Operators

24:41
CASE COALESCE Functions

13:19
DATE Functions

05:46
CTAS Cast Concat

14:11
Update Delete Truncate

12:31
HAVING Clause

07:43
Inner Join, Left Join, Right Join, Outer Join

19:27
Union Intersect View

17:35
Materialized View

08:19
Common Table Expression (CTE)

10:48
SQL Window Functions

22:40
MERGE statement & Summary

10:52

(Optional) AWS Pre-requisites – Python Basics

Python Intro – Architecture, PyCharm, Virtual Env

39:33
PyCharm & CLI Walkthrough

08:51
Compiled vs Interpreted

07:42
Everything is Python is Object

12:39
String Data Type

10:20
Number Data Type

04:02
List Data Type

11:36
Tuple Data Type

06:16
Set & Dict Data Type, Type Conversion

16:03
Python Operators & Memory

10:27
Set up Python interpreter in PyCharm

11:23
Print & Input Functions

16:04
IF Statement

14:36
For & While loops

15:57
Functions Intro

09:54
Function Scoping

15:20
Functions RETURN

07:53
Function Arguments

09:38
Modify Arguments

09:07
Positional & Keyword Arguments

09:42
args & kwargs

16:39
Class Object Self

32:49
Class-Instance Variables, __init__

17:01
Class Object Exercise 1

14:48
Class Object Exercise 2

13:48
Inheritance

07:50
Python Memory Management

08:05
Modules & Packages

33:27
HandsOn Exercise

01:31
Module Pre-compilation

03:28
Namespace & __name__

09:56
Error Handling in Python

14:00
File Handling

17:10
CSV & JSON module

13:37
Python Multi-threading concept

17:37
Multi-threading hands-on and exercise

22:01
Debugging & Profiling

18:26

Data Engineering Introduction

AWS Distributed Storage – S3 (Simple, Storage, Service) for Data Engineers

Introduction 1

14:38
Introduction 2

22:49
Basics

05:43
Basics Hands-on

18:53
Versioning

13:06
Encryption

05:51
Storage Class

20:18
Multipart Upload

12:51
Lifecycle Policies

15:04
Cross Region Replication

10:13
Mountpoint

09:21
Security – S3 Identity Based Policy

19:03
Security – S3 Bucket Policy

08:29
Bucket Policy with VPC, IP address, VPCE

03:50
Access Point

16:27
Object Lambda

18:55
Pre-signed URL

04:33
Performance Considerations

05:31
Pricings

13:00
Architectural Patterns using S3

07:25

Data Modelling – Normalization, ER Diagram, Dimensional Modelling

Highlights

13:09
Data Modelling Introduction

17:15
Normal Forms 1NF 2NF 3NF

28:01
Relations: one-to-one, one-to-many, many-to-one, many-to-many

08:51
Dimensional modelling – Facts, Dimensions & Grains

24:39
Grains Exercise

09:19
Dimensional Modelling Technique

15:00
Types of Fact & Dimension Tables

10:10
Data Virt Semantic Presentation Layers

13:24

Data Warehouse on AWS – Redshift Infra

Redshifts Objects

Querying, Connection, RSQL, QEV2

16:10
Query Editor & RSQL setup

17:54
Object Hierarcy, tables hands-on

19:14
Data Types Hands-on

14:20
Table operations Hands-on

12:27
Redshift ACID, Locks, Isolation Level

12:05
Implement Transactions

10:01
AccessShareLock & ShareRowExclusiveLock HandsOn

08:50
Redshift SUPER datatype

14:29
Section Summary

05:02

Redshift Deep Dive

Distribution Key & Style, Sort Key

29:35
Column Compression

06:42
Modify Dist Sort Key, Compression HandsOn

19:18
COPY Command Theory

06:33
COPY Command HandsOn

13:36
UNLOAD Command

07:16
AWS DMS – Move from OLTP to DWH

08:10
DMS – Setup Source OLTP & Python application

19:14
Setup DMS Instance, Endpoint, Task

15:28
DMS Task – OLTP to DWH

28:44
Table Maintenance – VACUUM & ANALYZE

09:46
Vacuum & Analyze HandsOn

11:25

Resdhift Features

Redshift Query Tuning

Redshift Workload Management (WLM)

Redshift Security- RBAC, CLS, RLS, Dynamic Data Masking (DDM)

Monitoring in Redshift

Reshift Serverless

Detailed Redshift Pricing

Redshift Additional Information

AWS Metadata Repository – Glue Data Catalog

Data Governance using AWS Lake Formation

Data Lakehouse on AWS – Athena

Athena Introduction

16:41
Athena Intro Hands On

10:44
Athena SerDe, File & Row format

12:51
SerDe, Format, CTAS Hands On

12:52
UNLOAD, Prepare & Execute, Query JSON

08:40
UNLOAD, Prepare & Execute, Query JSON Hands On

17:27
Schema Evolution, JSON_EXTRACT

13:54
Iceberg, ACID

26:19
Athena Partitioning & Bucketing

19:28
More DDL Commands

06:12
Athena WLM Theory

08:03
Workgroup HandsOn

11:32
Capacity Reservation HandsOn

04:06
Performance Tuning Theory

13:29
Athena Pricing & Performance Tuning

15:46
Architectural Patterns using Athena

08:30

Big Data Warehouse – HIVE

Hadoop Theory

21:05
File Formats

08:58
Hive Architecture & Components

16:35
Hive CLI

03:21
Data Types, databases, tables, File & Row Format, Hive SerDe

12:57
Hive Databases hands-on

17:07
Hive Tables hands-on

16:40
Partitioning & Bucketing

16:55
Partitioning & Bucketing hands-on

19:30
Load, insert, ACID, Materialized Views etc

15:08
JOINs, Locks, Configuration Parameters

14:26

Projects – Redshift, Athena, DataModelling, Pythin (Total Dataset Size – 150 GB)

New Features – Redshift, S3 & Athena

A course by

Soumyadeep Dey

Student Ratings & Reviews

No Review Yet

Data Engineering on AWS Vol 1 – OLAP & Data Warehouse

What Will You Learn?

Requirements

Audience

Course Content

Introduction – Data Engineering Volume 1 on AWS

Course Introduction and Resources

Course Introduction and Course Contents

(Optional) AWS Pre-requisites – EC2 & EBS

AWS Cloud and EC2 Introduction

EC2 Components & HandsOn 1

EC2 Handson 2

EBS Theory

EBS HandsOn

(Optional) AWS Pre-requisites – VPC

VPC Introduction & Components

VPC Components Hands On

Bastion Host

Security Groups

NAT Gateway & VPC Endpoint

VPC Peering

(Optional) AWS Pre-requisites – IAM

IAM Introduction & Hands On

IAM Service Roles

(Optional) AWS Pre-requisites – SQL Basics

SQL Introduction

SQL Client & Server Setup

SQL Database Objects Theory

Database Objects Hands On

CRUD Operations

SELECT Operators

CASE COALESCE Functions

DATE Functions

CTAS Cast Concat

Update Delete Truncate

HAVING Clause

Inner Join, Left Join, Right Join, Outer Join

Union Intersect View

Materialized View

Common Table Expression (CTE)

SQL Window Functions

MERGE statement & Summary

(Optional) AWS Pre-requisites – Python Basics

Python Intro – Architecture, PyCharm, Virtual Env

PyCharm & CLI Walkthrough

Compiled vs Interpreted

Everything is Python is Object

String Data Type

Number Data Type

List Data Type

Tuple Data Type

Set & Dict Data Type, Type Conversion

Python Operators & Memory

Set up Python interpreter in PyCharm

Print & Input Functions

IF Statement

For & While loops

Functions Intro

Function Scoping

Functions RETURN

Function Arguments

Modify Arguments

Positional & Keyword Arguments

args & kwargs

Class Object Self

Class-Instance Variables, __init__

Class Object Exercise 1

Class Object Exercise 2

Inheritance

Python Memory Management

Modules & Packages

HandsOn Exercise

Module Pre-compilation

Namespace & __name__

Error Handling in Python

File Handling

CSV & JSON module

Python Multi-threading concept

Multi-threading hands-on and exercise

Debugging & Profiling

Class-Instance Variables, init

Namespace & name