Lesson 1Design of transactional tables: orders, order_items, returns, lifetime_value signals and field choicesLearn how to design core transactional tables that capture orders, line items, returns, and lifetime value signals. We discuss key fields, normalization choices, and how to support downstream analytics and recommendation workloads.
Order header vs line item schema designModeling returns, refunds, and cancellationsCapturing discounts, coupons, and taxesStoring lifetime value and margin signalsKeys, indexes, and partitioning choicesLesson 2Handling noisy and sparse behavioral data: sessionization, bot filtering, deduplication, event weightingExplore techniques to clean noisy behavioral logs and make sparse data usable. You will learn sessionization rules, bot and scraper filtering, deduplication logic, and event weighting strategies tailored to recommendation training.
Sessionization rules and timeoutsDetecting and filtering bots and scrapersClick, view, and purchase deduplicationEvent weighting for model trainingHandling sparse users and cold startsLesson 3Design of product catalog table: product_id, title, category hierarchy, attributes, price, brand, stock, images, canonical_text, embeddingsLearn how to structure a product catalog table that supports fast retrieval and rich recommendations. We cover identifiers, attributes, pricing, stock, media, canonical text, and embeddings, plus strategies for updates and denormalization.
Stable product and variant identifiersCategory hierarchy and attributesPrice, stock, and availability fieldsImages, media, and canonical textStoring and updating item embeddingsLesson 4Feature engineering principles for recommendations: recency, frequency, monetary, item popularity, category affinity, user embeddingsDiscover core feature engineering principles for recommender systems. We detail recency, frequency, monetary value, popularity, category affinity, and user embeddings, including aggregation windows and leakage‑safe computation patterns.
Recency, frequency, and monetary featuresItem and category popularity signalsUser–category and brand affinity scoresSequence‑based and session featuresUser and item embedding generationLesson 5Auxiliary datasets: item metadata, taxonomy, promotions, content (descriptions), supplier dataUnderstand how auxiliary datasets enrich recommendations beyond raw clicks and orders. We cover item metadata, taxonomy, promotions, content, and supplier feeds, plus how to keep them consistent, versioned, and joinable at scale.
Designing item metadata schemasMaintaining product taxonomy hierarchiesModeling promotions and price rulesStoring rich content and descriptionsIntegrating supplier and feed dataLesson 6Data cleaning and imputation strategies: missing attributes, price anomalies, invalid timestampsLearn practical data cleaning and imputation methods for e‑commerce. We address missing attributes, anomalous prices, invalid timestamps, and inconsistent currencies, focusing on rules, heuristics, and impact on recommendation quality.
Detecting and fixing missing attributesHandling outlier and zero pricesCorrecting invalid or noisy timestampsCurrency, tax, and unit normalizationDocumenting cleaning rules and impactsLesson 7Design of event stream and interaction table: event_id, user_id/session_id, event_type, product_id, timestamp, context (referrer, page_type), device, event_valueDesign a unified interaction table and event stream that captures user behavior across channels. Learn event schemas, identifiers, context fields, and how to support both real‑time streaming and offline batch recommendation pipelines.
Choosing event and user identifiersModeling event types and propertiesCapturing context, device, and referrerEvent time, ingestion time, and orderingStreaming vs batch storage patternsLesson 8Design of user profiles table: essential fields (user_id, signup_ts, email_hash, demographics, lifecycle stage, segments, opt‑in flags) and rationaleDesign a user profiles table that balances personalization power with privacy and compliance. We cover essential fields, lifecycle and segments, opt‑in flags, hashing sensitive data, and how profiles feed recommendation models.
Core identifiers and signup metadataDemographics and lifecycle stagesBehavioral and marketing segmentsConsent, opt‑in, and preference flagsPrivacy, hashing, and retention rules