Data Analysis

Data Engineering

Build the pipelines that feed the analysts. The data-engineering lifecycle (generate → ingest → store → transform → serve), batch vs streaming, ETL vs ELT, the modern data stack (Fivetran, Airbyte, dbt, Snowflake, Looker / Hex / Mode), orchestration (Apache Airflow, Dagster, Prefect, AWS Step Functions, GitHub Actions), idempotency & retries, DAG design, dbt fundamentals (models, sources, refs, tests, snapshots, macros, packages), data warehousing concepts (star vs snowflake schema, slowly changing dimensions, Kimball vs Inmon), data lakes & the lakehouse, file formats (CSV, JSON, Avro, Parquet, ORC, Delta, Iceberg, Hudi), CDC (change data capture — Debezium, log-based vs trigger-based), event-driven architectures, Kafka in data pipelines, message queues (SQS, RabbitMQ, Pub/Sub), data quality & observability (Great Expectations, Monte Carlo, Soda, dbt tests), data contracts & schema evolution, lineage (OpenLineage, Marquez, DataHub), data catalogs (Atlan, Collibra, Unity Catalog), governance, privacy & PII (tokenization, masking, GDPR/CCPA), real-time pipelines (Flink, Materialize, Tinybird), cloud data platforms (AWS, GCP, Azure), serverless data (Athena, BigQuery, Snowpark), cost optimization, monitoring & on-call for data, SLAs & SLOs, and a capstone end-to-end pipeline. 30 units, 450 lessons.