Lesson 1Feature scaling and transformation: log transforms fi skewed revenue/quantity, robust scalingApply scaling an transformations fi stabilise variance an reduce skewness inna revenue an quantity, usin log transforms, robust scaling, an power transforms while preservin interpretability when needed.
Diagnosing skewness and heavy tailsLog and power transformationsStandard, min-max, and robust scalingScaling pipelines with sklearnInverse transforms for interpretationLesson 2Datetime feature engineering: weekday, hour, seasonality, recency and tenure features from order_date and customer historyEngineer time-based features from order dates an customer history, includin weekday, hour, seasonality, recency, an tenure, while respectin temporal order fi avoid leakage inna forecastin an classification tasks.
Extracting calendar-based featuresCyclic encoding of time variablesSeasonality and holiday indicatorsRecency and tenure feature designTime-aware leakage preventionLesson 3Imputation strategies fi numeric (median, KNN, model-based) and categorical fields (mode, 'unknown')Compare numeric an categorical imputation strategies, includin median, KNN, model-based, mode, an explicit "unknown" categories, wid diagnostics fi assess bias, variance, an robustness a di completed dataset.
Missingness mechanisms and patternsSimple numeric imputation methodsKNN and model-based imputationCategorical mode and "unknown" binsUsing missingness indicator flagsLesson 4Creating target variable fi chosen prediction (binary returned, continuous revenue, late delivery label)Define an construct target variables fi key business predictions, includin binary return flags, continuous revenue, an late delivery labels, ensurin clear definitions an alignment wid evaluation metrics.
Choosing the prediction objectiveDefining return and churn labelsRevenue and margin regression targetsLate delivery and SLA breach labelsAligning targets with metricsLesson 5Encoding techniques: one-hot, target encoding, frequency encoding, embeddings fi high-cardinality featuresExplore encoding methods fi categorical variables, from simple one-hot to target, frequency, an embedding-based encodings, wid guidance pon leakage prevention, regularization, an handlin high-cardinality features.
When to use one-hot encodingTarget encoding with leakage controlFrequency and count encodingsHashing and rare category handlingLearned embeddings for categoriesLesson 6Outlier detection and handling fi price, quantity, delivery_time_days, and revenueLearn fi detect, diagnose, an treat outliers inna price, quantity, delivery time, an revenue usin statistical rules an business logic, minimisin information loss while protectin downstream models from instability.
Univariate outlier detection rulesMultivariate and contextual outliersCapping, trimming, and winsorizationBusiness-rule based outlier flagsImpact of outliers on model trainingLesson 7Aggregations and customer-level features: historical return rate, avg order value, frequency, time since last orderBuild customer-level aggregations such as historical return rate, average order value, purchase frequency, an recency fi capture customer lifetime behaviour an improve segmentation an predictive performance.
Customer-level aggregation designHistorical return and complaint ratesAverage order value and basket sizePurchase frequency and recencyCustomer lifetime value proxiesLesson 8Promotion and pricing features: effective_unit_price, discount_pct, discount_applied flagCreate promotion an pricing features such as effective unit price, discount percentage, an discount flags fi capture promotional intensity, margin impact, an customer sensitivity to price changes ova time.
Computing effective unit priceDiscount percentage and depthBinary and multi-level promo flagsStacked and overlapping promotionsPrice elasticity proxy featuresLesson 9Train/test split strategies fi time-series/order data (time-based split, stratified by target, customer holdout)Design train an test split strategies fi time-ordered transactional data, usin time-based splits, stratification by target, an customer holdout schemes fi obtain realistic an unbiased performance estimates.
Pitfalls of random splits in time dataTime-based and rolling window splitsStratified splits for imbalanced targetsCustomer and store level holdoutsCross-validation for temporal dataLesson 10Geographic and logistics features: country-level metrics, shipping zones, typical delivery_time distributionDesign geographic an logistics features usin country-level metrics, shipping zones, an delivery time distributions fi capture operational constraints, regional behaviour, an service-level variability inna predictive models.
Country and region level aggregationsDefining shipping zones and lanesDelivery time distribution featuresDistance and cross-border indicatorsService level and SLA featuresLesson 11Standardizing and cleaning categorical variables: product_category, country, marketing_channel, device_typeStandardize an clean categorical variables such as product category, country, marketing channel, an device type by normalisin labels, mergin rare levels, an enforcin consistent taxonomies across datasets.
Detecting inconsistent category labelsString normalization and mappingMerging rare and noisy categoriesMaintaining category taxonomiesDocumenting categorical cleaning