Lesson 1Data validation rules: duplicates, referential integrity (customers/products), out-of-range values, negative prices/quantitiesDefine robust validation rules to keep sales CSVs trustworthy. You will detect duplicates, enforce referential integrity, and flag out-of-range or negative values before they corrupt dashboards and downstream models.
Detecting duplicate orders and order linesChecking referential integrity keysValidating numeric ranges and thresholdsHandling negative prices and quantitiesBuilding reusable validation checklistsLesson 2Understanding column semantics: order_id, order_date, customer_id, customer_region, product_id, product_category, product_subcategory, quantity, unit_price, discount, revenue, cost, channelClarify the meaning and role of core sales columns used in dashboards. You will map identifiers, dates, product attributes, quantities, and monetary fields, ensuring consistent semantics across models and visualisations.
Order-level identifiers and grain selectionCustomer and region identification fieldsProduct, category, and subcategory rolesQuantity, unit_price, discount, and revenueCost, channel, and margin-related fieldsLesson 3Handling discounts and price calculations: recomputing revenue from unit_price, quantity, and discount and reconciling with reported revenueUnderstand how to recompute and validate revenue and price metrics. You will calculate line revenue from unit_price, quantity, and discount, reconcile with reported totals, and flag inconsistencies for review.
Revenue formulas from unit_price and quantityApplying percentage and absolute discountsReconciling computed and reported revenueDetecting inconsistent discount patternsDocumenting pricing and discount logicLesson 4Time-based transformations: extracting year, quarter, month, week, weekday, rolling windows, and fiscal calendarsUnderstand how to transform order dates into rich time features for analysis. You will derive calendar and fiscal attributes, build rolling windows, and prepare consistent time fields for dashboards and time-series models.
Extracting year, quarter, month, and weekDeriving weekday and weekend indicatorsBuilding rolling and moving window metricsImplementing fiscal calendars and offsetsAligning time grains for dashboardsLesson 5Data cleaning transformations: trimming, case normalisation, standardising region and channel labelsExplore practical cleaning steps to make raw sales CSVs consistent and analysis-ready. You will trim whitespace, normalise case, and standardise region and channel labels to avoid duplicates and broken dashboard filters.
Trimming whitespace and invisible charactersCase normalisation for text dimensionsStandardising region and channel taxonomiesMerging near-duplicate label variantsDocumenting cleaning rules for reuseLesson 6Derived metrics and transformations: profit = revenue - cost, profit_margin = profit / revenue, gross_margin, AOV = revenue / order_count, unit_total = quantity * unit_priceLearn to derive key sales metrics from raw CSV fields. You will compute profit, margins, AOV, and unit totals, ensuring formulas are consistent, well documented, and aligned with business definitions.
Computing profit and gross marginCalculating profit_margin safelyDeriving AOV from revenue and ordersUnit totals from quantity and unit_priceAligning metrics with business definitionsLesson 7Techniques for reproducible ETL: documented steps, scripts, notebooks, and checksums for CSV import integrityLearn how to design reproducible ETL pipelines for sales CSVs. You will script transformations, track versions, use notebooks for exploration, and apply checksums and validation steps to guarantee import integrity over time.
Scripting repeatable CSV transformationsUsing notebooks for exploratory ETLVersioning ETL code and configurationChecksums and file integrity validationAutomated ETL runs and loggingLesson 8Missing values and null patterns: detection methods, imputation strategies, and when to drop rowsMaster techniques to detect and treat missing or null values in sales CSVs. You will profile null patterns, choose imputation strategies, decide when to drop rows, and document assumptions to protect downstream metrics.
Profiling missingness across key columnsVisualising null patterns and correlationsImputation strategies for numeric fieldsImputation strategies for categorical fieldsRules for safely dropping rows or columnsLesson 9Data types and parsing: date formats, numeric types, categorical encoding, handling string vs numeric valuesLearn how to correctly parse dates, numbers, and categories in sales CSVs. You will distinguish text from numeric fields, apply locale-aware parsing, and design robust categorical encodings that remain stable across refreshes.
Detecting column data types in CSV importsParsing dates with multiple locale formatsHandling numeric separators and currency symbolsDesigning stable categorical encodingsConverting mixed-type columns safelyLesson 10Dealing with multi-line orders and aggregation at order vs order-line levelLearn how to handle orders that span multiple lines in sales CSVs. You will distinguish order and order-line grain, aggregate correctly, and avoid double counting revenue, quantity, and discounts in dashboards.
Identifying order vs order-line grainAggregating revenue at order levelSummarising discounts across linesAvoiding double counting in rollupsChoosing grain for dashboard metrics