Lesson 1Feature scaling and transformation: log transforms for skewed revenue/quantity, robust scalingUse scaling and changes to steady numbers and cut unevenness in sales and amounts, with log changes, strong scaling, and power changes while keeping meaning clear when needed.
Diagnosing skewness and heavy tailsLog and power transformationsStandard, min-max, and robust scalingScaling pipelines with sklearnInverse transforms for interpretationLesson 2Datetime feature engineering: weekday, hour, seasonality, recency and tenure features from order_date and customer historyBuild time features from order dates and customer past, like day of week, hour, seasons, how recent, and how long, keeping time order to avoid mix-ups in future guesses and grouping.
Extracting calendar-based featuresCyclic encoding of time variablesSeasonality and holiday indicatorsRecency and tenure feature designTime-aware leakage preventionLesson 3Imputation strategies for numeric (median, KNN, model-based) and categorical fields (mode, 'unknown')Compare filling ways for numbers and groups, like middle value, near neighbors, model fill, most common, and 'not known' groups, with checks for bias, spread, and strength of filled data.
Missingness mechanisms and patternsSimple numeric imputation methodsKNN and model-based imputationCategorical mode and "unknown" binsUsing missingness indicator flagsLesson 4Creating target variable for chosen prediction (binary returned, continuous revenue, late delivery label)Make and build target measures for main business guesses, like yes/no return, steady sales money, and late delivery marks, with clear meanings and match to check measures.
Choosing the prediction objectiveDefining return and churn labelsRevenue and margin regression targetsLate delivery and SLA breach labelsAligning targets with metricsLesson 5Encoding techniques: one-hot, target encoding, frequency encoding, embeddings for high-cardinality featuresLook at ways to code group variables, from basic one-hot to target, count, and embed codes, with tips on stopping mix-ups, steadying, and handling many-group features.
When to use one-hot encodingTarget encoding with leakage controlFrequency and count encodingsHashing and rare category handlingLearned embeddings for categoriesLesson 6Outlier detection and handling for price, quantity, delivery_time_days, and revenueFind, check, and fix odd values in price, amount, delivery days, and sales using number rules and business sense, keeping info while guarding models from shakes.
Univariate outlier detection rulesMultivariate and contextual outliersCapping, trimming, and winsorizationBusiness-rule based outlier flagsImpact of outliers on model trainingLesson 7Aggregations and customer-level features: historical return rate, avg order value, frequency, time since last orderBuild customer sums like past return rate, average order worth, buy times, and time from last buy to catch customer life ways and better grouping and guessing.
Customer-level aggregation designHistorical return and complaint ratesAverage order value and basket sizePurchase frequency and recencyCustomer lifetime value proxiesLesson 8Promotion and pricing features: effective_unit_price, discount_pct, discount_applied flagMake promotion and price features like true unit price, discount share, and discount marks to catch promo strength, margin hit, and customer price feel over time.
Computing effective unit priceDiscount percentage and depthBinary and multi-level promo flagsStacked and overlapping promotionsPrice elasticity proxy featuresLesson 9Train/test split strategies for time-series/order data (time-based split, stratified by target, customer holdout)Plan train and test splits for time-order trade data, using time splits, target layers, and customer holds to get true, fair work checks.
Pitfalls of random splits in time dataTime-based and rolling window splitsStratified splits for imbalanced targetsCustomer and store level holdoutsCross-validation for temporal dataLesson 10Geographic and logistics features: country-level metrics, shipping zones, typical delivery_time distributionBuild place and move features using country numbers, ship areas, and delivery time spreads to catch work limits, area ways, and service changes in guess models.
Country and region level aggregationsDefining shipping zones and lanesDelivery time distribution featuresDistance and cross-border indicatorsService level and SLA featuresLesson 11Standardizing and cleaning categorical variables: product_category, country, marketing_channel, device_typeEven and clean group variables like product group, country, sales path, and device kind by standard names, join rare ones, and keep steady groups across data.
Detecting inconsistent category labelsString normalization and mappingMerging rare and noisy categoriesMaintaining category taxonomiesDocumenting categorical cleaning