Architect massive storage solutions. Design Data Lakes on S3 and ADLS using Parquet, partitioning strategies, and query engines like Athena.
Data Lakes store vast amounts of structured and unstructured data. This course teaches the architectural patterns for building successful lakes on AWS S3 and Azure Data Lake Storage (ADLS). You will learn about file formats (Parquet, Avro), optimal partitioning strategies to improve query speed, and data lifecycle management. We covers cataloging data with AWS Glue and querying it directly using serverless SQL engines like Athena. Essential for big data engineers.
Estimated completion time: 21 lessons • Self-paced learning • Lifetime access
Lakes hold raw/unstructured data; Warehouses are curated.
We cover the concepts of ACID on lakes (Lakehouse).
Object storage (S3) behaves differently than disks.
Very cheap storage, pay per query execution.