Lesson 1Feature scaling and transformation: log transforms for skewed revenue/quantity, robust scalingUse scaling and changes to steady variation and cut unevenness in sales and amounts, applying log shifts, tough scaling, and power methods while keeping meaning clear when required.
Diagnosing skewness and heavy tailsLog and power transformationsStandard, min-max, and robust scalingScaling pipelines with sklearnInverse transforms for interpretationLesson 2Datetime feature engineering: weekday, hour, seasonality, recency and tenure features from order_date and customer historyBuild time features from order dates and customer past, like day of week, hour, seasons, recentness, and length of stay, keeping time order to prevent leaks in predictions.
Extracting calendar-based featuresCyclic encoding of time variablesSeasonality and holiday indicatorsRecency and tenure feature designTime-aware leakage preventionLesson 3Imputation strategies for numeric (median, KNN, model-based) and categorical fields (mode, 'unknown')Review filling strategies for numbers and groups, like middle value, nearest neighbours, model fills, common choice, and 'not known' tags, with checks for fairness, spread, and strength.
Missingness mechanisms and patternsSimple numeric imputation methodsKNN and model-based imputationCategorical mode and "unknown" binsUsing missingness indicator flagsLesson 4Creating target variable for chosen prediction (binary returned, continuous revenue, late delivery label)Set up goal variables for main business forecasts, like yes/no returns, steady income, and late delivery marks, with clear meanings tied to success measures.
Choosing the prediction objectiveDefining return and churn labelsRevenue and margin regression targetsLate delivery and SLA breach labelsAligning targets with metricsLesson 5Encoding techniques: one-hot, target encoding, frequency encoding, embeddings for high-cardinality featuresStudy ways to code group variables, from basic one-hot to goal-based, count-based, and embedding codes, with tips on leak stops, smoothing, and many-option handling.
When to use one-hot encodingTarget encoding with leakage controlFrequency and count encodingsHashing and rare category handlingLearned embeddings for categoriesLesson 6Outlier detection and handling for price, quantity, delivery_time_days, and revenueSpot, check, and manage odd values in price, amount, delivery days, and income using number rules and business sense, keeping data value while guarding models from upsets.
Univariate outlier detection rulesMultivariate and contextual outliersCapping, trimming, and winsorizationBusiness-rule based outlier flagsImpact of outliers on model trainingLesson 7Aggregations and customer-level features: historical return rate, avg order value, frequency, time since last orderMake customer summaries like past return share, average order worth, buy frequency, and days since last buy to show lifetime patterns and boost grouping and forecasts.
Customer-level aggregation designHistorical return and complaint ratesAverage order value and basket sizePurchase frequency and recencyCustomer lifetime value proxiesLesson 8Promotion and pricing features: effective_unit_price, discount_pct, discount_applied flagBuild promo and price features like true unit cost, discount share, and promo flags to show deal strength, profit effects, and buyer price reactions over time.
Computing effective unit priceDiscount percentage and depthBinary and multi-level promo flagsStacked and overlapping promotionsPrice elasticity proxy featuresLesson 9Train/test split strategies for time-series/order data (time-based split, stratified by target, customer holdout)Plan training and test splits for time-based trade data, using time cuts, goal-balanced splits, and customer reserves for true, fair performance views.
Pitfalls of random splits in time dataTime-based and rolling window splitsStratified splits for imbalanced targetsCustomer and store level holdoutsCross-validation for temporal dataLesson 10Geographic and logistics features: country-level metrics, shipping zones, typical delivery_time distributionCraft place and delivery features with country stats, ship areas, and delivery time spreads to show work limits, area habits, and service changes in forecasts.
Country and region level aggregationsDefining shipping zones and lanesDelivery time distribution featuresDistance and cross-border indicatorsService level and SLA featuresLesson 11Standardizing and cleaning categorical variables: product_category, country, marketing_channel, device_typeClean and unify group variables like product type, country, sales path, and device kind by standard labels, grouping rare ones, and matching lists across data.
Detecting inconsistent category labelsString normalization and mappingMerging rare and noisy categoriesMaintaining category taxonomiesDocumenting categorical cleaning