Lesson 1Feature scaling and transformation: log transforms for skewed revenue/quantity, robust scalingUse scaling and changes to steady variance and cut skewness in revenue and quantity, with log transforms, robust scaling, power transforms, keep meaning where needed.
Diagnosing skewness and heavy tailsLog and power transformationsStandard, min-max, and robust scalingScaling pipelines with sklearnInverse transforms for interpretationLesson 2Datetime feature engineering: weekday, hour, seasonality, recency and tenure features from order_date and customer historyMake time features from order dates and customer past, like weekday, hour, seasons, recency, tenure, respect time order to avoid leakage in forecast and class tasks.
Extracting calendar-based featuresCyclic encoding of time variablesSeasonality and holiday indicatorsRecency and tenure feature designTime-aware leakage preventionLesson 3Imputation strategies for numeric (median, KNN, model-based) and categorical fields (mode, 'unknown')Compare number and category fill strategies, median, KNN, model-based, mode, 'unknown' categories, with checks for bias, variance, strength of filled data.
Missingness mechanisms and patternsSimple numeric imputation methodsKNN and model-based imputationCategorical mode and "unknown" binsUsing missingness indicator flagsLesson 4Creating target variable for chosen prediction (binary returned, continuous revenue, late delivery label)Define and build target variables for key predictions, binary return flags, revenue continuous, late delivery labels, clear defs and match eval metrics.
Choosing the prediction objectiveDefining return and churn labelsRevenue and margin regression targetsLate delivery and SLA breach labelsAligning targets with metricsLesson 5Encoding techniques: one-hot, target encoding, frequency encoding, embeddings for high-cardinality featuresCheck encoding for categories, one-hot to target, frequency, embeddings, guide on no leakage, regularize, high-cardinality features.
When to use one-hot encodingTarget encoding with leakage controlFrequency and count encodingsHashing and rare category handlingLearned embeddings for categoriesLesson 6Outlier detection and handling for price, quantity, delivery_time_days, and revenueLearn to find, check, fix outliers in price, quantity, delivery time, revenue with stats rules and business sense, cut info loss, protect models from shake.
Univariate outlier detection rulesMultivariate and contextual outliersCapping, trimming, and winsorizationBusiness-rule based outlier flagsImpact of outliers on model trainingLesson 7Aggregations and customer-level features: historical return rate, avg order value, frequency, time since last orderBuild customer aggs like past return rate, avg order value, buy frequency, recency to catch lifetime behaviour, better segment and predict.
Customer-level aggregation designHistorical return and complaint ratesAverage order value and basket sizePurchase frequency and recencyCustomer lifetime value proxiesLesson 8Promotion and pricing features: effective_unit_price, discount_pct, discount_applied flagMake promo and price features like effective unit price, discount pct, flags to catch promo strength, margin hit, customer price sense over time.
Computing effective unit priceDiscount percentage and depthBinary and multi-level promo flagsStacked and overlapping promotionsPrice elasticity proxy featuresLesson 9Train/test split strategies for time-series/order data (time-based split, stratified by target, customer holdout)Plan train test splits for time order transaction data, time splits, target stratify, customer holdout for real unbiased performance.
Pitfalls of random splits in time dataTime-based and rolling window splitsStratified splits for imbalanced targetsCustomer and store level holdoutsCross-validation for temporal dataLesson 10Geographic and logistics features: country-level metrics, shipping zones, typical delivery_time distributionMake geo and logistics features with country metrics, shipping zones, delivery time distros to catch ops limits, region behaviour, service change in models.
Country and region level aggregationsDefining shipping zones and lanesDelivery time distribution featuresDistance and cross-border indicatorsService level and SLA featuresLesson 11Standardizing and cleaning categorical variables: product_category, country, marketing_channel, device_typeStandardize clean categories like product cat, country, marketing channel, device type by normal labels, merge rare, consistent taxonomies.
Detecting inconsistent category labelsString normalization and mappingMerging rare and noisy categoriesMaintaining category taxonomiesDocumenting categorical cleaning