Aralin 1Feature scaling at transformation: log transforms para sa skewed revenue/quantity, robust scalingMag-apply ng scaling at transformations upang mag-stabilize ng variance at mabawasan ang skewness sa revenue at quantity, gamit ang log transforms, robust scaling, at power transforms habang pinapanatili ang interpretability kung kinakailangan.
Pag-diagnose ng skewness at heavy tailsLog at power transformationsStandard, min-max, at robust scalingScaling pipelines gamit ang sklearnInverse transforms para sa interpretationAralin 2Datetime feature engineering: weekday, hour, seasonality, recency at tenure features mula sa order_date at customer historyMag-engineer ng time-based features mula sa order dates at customer history, kabilang ang weekday, hour, seasonality, recency, at tenure, habang iginagalang ang temporal order upang maiwasan ang leakage sa forecasting at classification tasks.
Pagkuha ng calendar-based featuresCyclic encoding ng time variablesSeasonality at holiday indicatorsRecency at tenure feature designTime-aware leakage preventionAralin 3Imputation strategies para sa numeric (median, KNN, model-based) at categorical fields (mode, 'unknown')Ipa-compare ang numeric at categorical imputation strategies, kabilang ang median, KNN, model-based, mode, at explicit "unknown" categories, na may diagnostics upang suriin ang bias, variance, at robustness ng natapos na dataset.
Missingness mechanisms at patternsSimple numeric imputation methodsKNN at model-based imputationCategorical mode at "unknown" binsPaggamit ng missingness indicator flagsAralin 4Paggawa ng target variable para sa napiling prediction (binary returned, continuous revenue, late delivery label)Itakda at bumuo ng target variables para sa mahahalagang business predictions, kabilang ang binary return flags, continuous revenue, at late delivery labels, na tinitiyak ang malinaw na definisyon at pagkakaayon sa evaluation metrics.
Pagpili ng prediction objectivePagdidisenyo ng return at churn labelsRevenue at margin regression targetsLate delivery at SLA breach labelsPagkakaayon ng targets sa metricsAralin 5Encoding techniques: one-hot, target encoding, frequency encoding, embeddings para sa high-cardinality featuresGalugarin ang encoding methods para sa categorical variables, mula sa simple one-hot hanggang target, frequency, at embedding-based encodings, na may gabay sa leakage prevention, regularization, at paghawak ng high-cardinality features.
Kailan gagamitin ang one-hot encodingTarget encoding na may leakage controlFrequency at count encodingsHashing at rare category handlingLearned embeddings para sa categoriesAralin 6Outlier detection at handling para sa price, quantity, delivery_time_days, at revenueMatututunan ang pagdetect, diagnose, at pag-treat ng outliers sa price, quantity, delivery time, at revenue gamit ang statistical rules at business logic, na minimi-minimize ang information loss habang pinoprotektahan ang downstream models mula sa instability.
Univariate outlier detection rulesMultivariate at contextual outliersCapping, trimming, at winsorizationBusiness-rule based outlier flagsImpact ng outliers sa model trainingAralin 7Aggregations at customer-level features: historical return rate, avg order value, frequency, time since last orderBumuo ng customer-level aggregations tulad ng historical return rate, average order value, purchase frequency, at recency upang ma-capture ang customer lifetime behavior at mapabuti ang segmentation at predictive performance.
Customer-level aggregation designHistorical return at complaint ratesAverage order value at basket sizePurchase frequency at recencyCustomer lifetime value proxiesAralin 8Promotion at pricing features: effective_unit_price, discount_pct, discount_applied flagLumikha ng promotion at pricing features tulad ng effective unit price, discount percentage, at discount flags upang ma-capture ang promotional intensity, margin impact, at customer sensitivity sa price changes sa paglipas ng panahon.
Pagkuha ng effective unit priceDiscount percentage at depthBinary at multi-level promo flagsStacked at overlapping promotionsPrice elasticity proxy featuresAralin 9Train/test split strategies para sa time-series/order data (time-based split, stratified by target, customer holdout)Idisenyo ang train at test split strategies para sa time-ordered transactional data, gamit ang time-based splits, stratification by target, at customer holdout schemes upang makakuha ng realistic at unbiased performance estimates.
Pitfalls ng random splits sa time dataTime-based at rolling window splitsStratified splits para sa imbalanced targetsCustomer at store level holdoutsCross-validation para sa temporal dataAralin 10Geographic at logistics features: country-level metrics, shipping zones, typical delivery_time distributionIdisenyo ang geographic at logistics features gamit ang country-level metrics, shipping zones, at delivery time distributions upang ma-capture ang operational constraints, regional behavior, at service-level variability sa predictive models.
Country at region level aggregationsPagdidisenyo ng shipping zones at lanesDelivery time distribution featuresDistance at cross-border indicatorsService level at SLA featuresAralin 11Standardizing at cleaning ng categorical variables: product_category, country, marketing_channel, device_typeI-standardize at i-clean ang categorical variables tulad ng product category, country, marketing channel, at device type sa pamamagitan ng pag-normalize ng labels, pagsasama ng rare levels, at pagpapatupad ng consistent taxonomies sa datasets.
Pagdetect ng inconsistent category labelsString normalization at mappingPagsasama ng rare at noisy categoriesPagpapanatili ng category taxonomiesPagdokumento ng categorical cleaning