Lesson 1Data Lake and Object Storage Picks: S3, GCS, Azure Blob — Split Strategies, File Forms (Parquet/ORC/Avro) and SqueezeLook into data lake design on main clouds, comparing S3, GCS, and Azure Blob. Learn split strategies, file setup, and how Parquet, ORC, Avro, and squeeze choices affect speed, cost, and later handling for African data centres.
Comparing S3, GCS, and Azure Blob skillsPlanning buckets, folders, and naming waysSplitting by time, entity, and life stagePicking Parquet, ORC, or Avro for tasksSqueeze codes and speed trade-offsTuning small files and squeeze jobsLesson 2Batch Intake and Link Work: Sqoop/CDC Tools, AWS Glue, Google Dataflow Batch, Airbyte for Links, Night Export TimingLearn batch intake choices from databases and SaaS systems using Sqoop, CDC tools, AWS Glue, Google Dataflow batch, and Airbyte. Plan night and mid-day loads, form handling, and link across mixed sources for local efficiency.
Sqoop and JDBC bulk pullChange Data Catch tools and patternsAWS Glue jobs for batch intakeGoogle Dataflow batch line designAirbyte links and setupPlanning night and mid-day load timesLesson 3Stream Handling Frames: Apache Flink, Kafka Streams, Spark Structured Streaming — Exact-Once Rules, State Control, Windowing, WatermarkingDive into stream handling with Apache Flink, Kafka Streams, and Spark Structured Streaming. Learn to plan stateful operators, do exact once rules, and set windows and watermarks for strong real-time analysis in Zimbabwean streams.
Flink setup and rollout choicesKafka Streams layout and state storesSpark Structured Streaming small batch modelExact once rules and repeat-free endsState control, checkpoints, and recoveryWindowing, watermarking, and late eventsLesson 4Link and API Layers: GraphQL/REST Links, Made Views for Product Feeds, Data Reach Patterns for UsersLook into link and API layers that show analysis and operation data. Learn GraphQL and REST patterns, using made views for product feeds, and planning safe, ruled data reach for varied users in local networks.
REST API design for data reachGraphQL forms and solvers for analysisUsing made views for product feedsCaching and page strategies for APIsRow level safety and permissionVersioning and back-fit contractsLesson 5Streaming Intake Choices and Patterns: Kafka, Confluent Platform, AWS Kinesis, Google Pub/Sub — Makers, Splitting, Form Change ThoughtsUnderstand streaming intake platforms including Kafka, Confluent, Kinesis, and Pub/Sub. Learn maker design, split strategies, form change, and patterns for lasting, scalable event gathering across areas in Zimbabwe.
Kafka topics, splits, and copiesConfluent Platform partsAWS Kinesis streams and firehose useGoogle Pub/Sub design and limitsMaker design, batching, and pressure backForm change with Avro and form storeLesson 6Real-Time Serving Stores: Redis, RocksDB-Backed Stores, Cassandra, Druid for OLAP Streaming QueriesStudy real-time serving stores like Redis, RocksDB backed motors, Cassandra, and Druid. Learn reach patterns, data shaping, and how to back low delay lookups and OLAP style queries on fresh streaming data for local apps.
Redis as cache and main data storeRocksDB backed stateful servicesCassandra data shaping for time seriesDruid setup for streaming OLAPBalancing sameness, delay, and costPlanning capacity and hot spot fixesLesson 7Data Warehouse Choices for Analysis: Snowflake, BigQuery, Redshift — CTAS, Made Views, Cost/Speed Trade-OffsCompare data warehouse choices like Snowflake, BigQuery, and Redshift. Learn CTAS patterns, made views, grouping, and how to balance cost, speed, and data speed for analysis tasks in Zimbabwean clouds.
Snowflake virtual warehouses and scalingBigQuery storage and query tuningRedshift spread and sort keysCTAS patterns for derived tablesMade views and refresh rulesCost against speed trade-offs and tuningLesson 8Batch Handling and Timing: Apache Spark, Spark on EMR/Dataproc, DBT for Changes, Airflow/Cloud Composer/Managed Workflows for TimingUnderstand batch handling with Spark on EMR and Dataproc, and SQL-focused changes with dbt. Learn timing patterns using Airflow, Cloud Composer, and Managed Workflows to build reliable, watchable batch lines for local ops.
Spark cluster modes and resource sizingSpark job design for ETL and ELTdbt models, tests, and docsAirflow DAG design and depend controlTiming, retries, and service agreements for batch jobsWatching, logging, and alerting for linesLesson 9Feature Store and ML Data Platform: Feast, Tecton, or Custom Feature Lines Using Delta Lake/BigQuery; Online vs Offline Feature ServingLook at feature stores and ML data platforms using Feast, Tecton, or custom lines on Delta Lake and BigQuery. Learn feature defs, line, and how to handle online vs offline serving for steady model behaviour in Zimbabwe.
Main ideas of feature stores and entitiesFeast setup and rollout patternsTecton skills and link choicesBuilding custom feature lines on Delta LakeOffline feature calc in BigQueryOnline vs offline feature serving design