Lesson 1Schema validation: required fields, data types, date parsing, and timezone handlingUnderstand how to define and enforce robust schemas for order-level data, validating required fields, data types, and date formats while correctly handling time zones, late-arriving data, and schema evolution across multiple source systems relevant to Singapore businesses.
Defining required order-level fieldsValidating numeric and string data typesParsing dates and timestamps safelyStandardising time zones and offsetsCatching schema drift and evolutionAutomated schema checks in pipelinesLesson 2Documenting data lineage and assumptions for reproducibility and auditabilityUnderstand how to document data lineage, business rules, and modelling assumptions for retail order pipelines, enabling reproducibility, governance, and auditability across teams, tools, and evolving source systems in a Singapore regulatory environment.
Capturing source-to-target mappingsRecording business transformation rulesTracking metric definitions over timeMaintaining data dictionariesVersioning pipelines and schemasAudit trails for regulatory reviewsLesson 3Loading CSVs into analytical tools and environment setup (Excel, SQL, Python, R, BI tools)Gain practical skills for loading CSV order files into Excel, SQL databases, Python, R, and BI tools, configuring encodings, delimiters, data types, and project environments to ensure reproducible, scalable analytical workflows for Singapore analysts.
Configuring CSV import optionsManaging encodings and delimitersBulk loading into SQL warehousesPython and R data ingestion scriptsConnecting BI tools to raw tablesVersioning and environment managementLesson 4Temporal derivations: extracting date parts, rolling windows, fiscal calendars, week/month boundariesExplore techniques to derive temporal features from order timestamps, including calendar attributes, fiscal periods, rolling windows, and custom week or month boundaries that align with retail trading patterns and reporting requirements in Singapore.
Extracting standard date partsBuilding fiscal calendars and periodsCustom retail week and month boundariesRolling windows for KPIsLag and lead features for ordersSeasonality and holiday flagsLesson 5Data partitioning and sampling for efficient exploration and reproducible analysisLearn how to partition and sample large retail order datasets for efficient exploration, model development, and testing, while preserving temporal structure, seasonality, and key business segments for reproducible analytical experiments in Singapore retail.
Partitioning by date and storeTrain, validation, and test splitsStratified sampling by segmentDownsampling and upsampling tacticsCreating reproducible random samplesManaging partitions in data warehousesLesson 6Detecting and handling missing values: strategies and imputation specific to transactional dataLearn systematic methods to detect, profile, and treat missing values in transactional retail data, choosing appropriate imputation or exclusion strategies that preserve revenue, quantity, and customer behaviour signals without biasing analyses for Singapore markets.
Profiling missingness patternsMCAR, MAR, and MNAR in retail dataImputing prices, discounts, and costsHandling missing customer identifiersDealing with incomplete order linesDocumenting imputation decisionsLesson 7Outlier detection and treatment for price, quantity, discount, and revenue fieldsLearn to detect, diagnose, and treat outliers in price, quantity, discount, and revenue fields, distinguishing data errors from genuine extreme behaviour to protect model stability and business reporting accuracy in Singapore retail analytics.
Profiling distributions and extremesRule-based outlier thresholdsStatistical and robust detection methodsSeparating errors from rare eventsCapping, trimming, and winsorisingMonitoring outliers over timeLesson 8Standardising categorical fields: region, product_category, product_subcategory, marketing_channel, device_typeLearn how to standardise key categorical attributes in retail orders so regions, product hierarchies, marketing channels, and device types are consistent, analysable, and ready for segmentation, attribution, and performance reporting in Singapore.
Designing canonical code listsNormalising region and market labelsStandardising product category hierarchiesCleaning marketing_channel valuesHarmonising device_type and platformHandling legacy and deprecated valuesLesson 9Creating derived fields: gross_margin, margin_rate, average_order_value, unit_cost, order_value componentsMaster the creation of core financial and behavioural derived metrics from order data, including gross margin, margin rate, average order value, unit costs, and decomposed order value components that support profitability and pricing analysis in Singapore.
Calculating gross_margin and net_revenueComputing margin_rate and markupsAverage_order_value and basket sizeUnit_cost and unit_price derivationsDecomposing order_value componentsValidating derived metric consistency