Lesson 1Designin' dimensional model fi analytics: facts an' dimensions mappin'Explore dimensional modelin' fi analytics. Yuh wi map business processes to fact tables, define dimensions an' grain, handle many-to-many relationships, an' design schemas dat support self-service BI an' performant queries. Map it out like plannin' a route to Negril.
Identifyin' business processes an' factsChoosin' di grain of fact tablesDesignin' conformed dimensionsHandlin' many-to-many relationshipsStar vs snowflake schema trade-offsLesson 2fact_user_activity: schema fields (event_id, user_id, course_id, event_type, timestamp, duration, device, session_id, ingestion_metadata)Model di fact_user_activity table fi behavioral analytics. We define each field, choose di event grain, capture sessions an' devices, store ingestion metadata, an' support funnels, engagement, an' retention reportin'. Track activity like watchin' di dancers at Carnival.
Definin' di event-level grainModelin' event_type an' taxonomiesSession an' device attributionUsin' ingestion_metadata fieldsSupportin' funnels an' retentionLesson 3Derived metrics an' aggregations: daily active users (DAU), active users per course, course completion rate by country, revenue by course an' cohortLearn fi build derived metrics an' aggregations from fact tables. We compute DAU, per-course activity, completion rates, an' cohort revenue, emphasizin' reproducible definitions an' efficient materialization patterns. Aggregate like countin' di harvest.
Definin' DAU an' MAU consistentlyActive users per course metricsCompletion rate by segment an' countryRevenue by course an' cohortMaterializin' aggregates in warehousesLesson 4Handlin' slowly changin' dimensions (SCD Type 1/2) an' time-travel/versionin'Learn how fi manage slowly changin' dimensions an' time-travel. We compare SCD Type 1 an' Type 2, design effective date ranges, store version metadata, an' use warehouse features fi query historical dimension states. Handle changes like seasons in Jamaica.
When fi use SCD Type 1 vs Type 2Effective dates an' current flagsImplementin' SCD in SQL an' dbtUsin' warehouse time-travel featuresTestin' historical dimension correctnessLesson 5Data freshness an' windowin' strategies fi near-real-time KPIsDesign freshness strategies fi near-real-time KPIs. We cover latency budgets, watermarkin', windowin' fi streamin' metrics, late-arrivin' data handlin', an' monitorin' freshness fi keep dashboards reliable an' actionable. Keep it fresh like mangoes from di tree.
Definin' freshness an' latency targetsWatermarks an' late data handlin'Tumblin', slidin', an' session windowsNear-real-time KPI computationMonitorin' an' alertin' on freshnessLesson 6Data cleansin' steps: parsin', type coercion, timestamp normalization, deduplication, enrichmentLearn how fi systematically clean raw data before modelin'. We cover parsin' semi-structured fields, enforcin' data types, normalizin' timestamps across time zones, deduplicatin' records, an' enrichin' datasets wid reference an' lookup data. Cleanse it proper like washin' before church.
Parsin' JSON, CSV, an' nested fieldsType coercion an' schema validationTimezone normalization an' standardsRecord deduplication strategiesReference data joins fi enrichmentLesson 7dim_course: schema fields (course_id, title, author_id, category, price, published_date)Design di dim_course dimension fi support course analytics. We define each field, discuss surrogate keys, handle pricin' an' category changes, an' model publication dates fi enable accurate historical an' catalog reportin'. Structure courses like organizin' di school roster.
Business meanin' of each dim_course fieldNatural keys vs surrogate course_idHandlin' course price an' category changesModelin' published an' unpublished statesIndexin' dim_course fi BI queriesLesson 8dim_user: schema fields (user_id, email_hash, signup_date, country, subscription_status, cohort)Design di dim_user dimension fi user analytics an' segmentation. We define each field, hash sensitive data, track signup an' cohorts, an' model subscription status fi support growth, retention, an' monetization reportin'. Profile users like knowin' yuh neighbors.
User identifiers an' surrogate keysEmail hashin' an' privacy controlsModelin' signup_date an' cohortsSubscription_status lifecycleCountry an' localization attributesLesson 9Transformation patterns: ELT vs ETL, incremental transformations, batch vs stream transformsUnderstand core transformation patterns an' when fi apply dem. We compare ETL an' ELT, design incremental pipelines, an' contrast batch an' streamin' transforms, focusin' on cost, latency, observability, an' operational trade-offs. Pattern it like dancin' to di riddim.
Comparin' ETL an' ELT architecturesDesignin' incremental transformationsBatch processin' pros an' consStreamin' an' micro-batch patternsChoosin' patterns by SLA an' costLesson 10Tools fi transformations: dbt, Spark/Databricks, Beam/Flink, SQL-based transformation frameworksSurvey key transformation tools an' how fi choose between dem. We compare dbt, Spark, Databricks, Beam, Flink, an' SQL frameworks, focusin' on scalability, orchestration, testin', an' integration wid modern data stacks. Tool up like a carpenter wid di right hammer.
dbt fi SQL-centric transformationsSpark an' Databricks fi big dataBeam an' Flink fi streamin' jobsSQL-based transformation frameworksTool selection criteria an' trade-offsLesson 11fact_payments: schema fields (payment_id, user_id, course_id, amount, currency, payment_method, status, timestamp, invoice_id)Model di fact_payments table fi revenue analytics. We define each field, discuss idempotent ingestion, currency handlin', payment status, refunds, an' how fi link payments to users, courses, an' invoices fi downstream reportin'. Track payments like countin' di dollars at market.
Grain an' primary key of fact_paymentsModelin' payment status an' lifecycleHandlin' multi-currency amountsLinkin' payments to users an' coursesCapturin' refunds an' chargebacks