Lesson 1Handling channel metadata (channel-specific token patterns, metadata encoding)Learn how to handle channel details like chat, email, and phone records. We talk about special token patterns for each channel, ways to code them, and mixing metadata with text for better modeling.
Listing support channels and fieldsSpecial token patterns for channelsOne-hot and embedding codingMixing text and metadata featuresDealing with missing channel detailsLesson 2Emoji, emoticon and non-standard token handling and mapping to sentiment signalsLearn how to make emojis, emoticons, and unusual tokens normal while keeping feelings. We discuss ways to map them, word lists, and adding these signs to models for feelings and intentions.
Listing emoji and emoticon useUnicode handling and making normalMapping tokens to feeling scoresMaking custom emoji word listsAdding signals to modelsLesson 3Punctuation, contractions, and tokenization strategies for English support textLook at punctuation, short forms, and ways to break text for English support. We compare rule ways and tool tokenizers, handle tricky parts, and match breaking to model needs.
Role of punctuation in support ticketsExpanding and making normal short formsRule-based vs number tokenizersHandling URLs and emojis in tokensTokenization for transformer modelsLesson 4Stemming vs lemmatization: algorithms, libraries, and when to apply eachCompare stemming and lemmatization ways, with methods and tools. You will know when to use each in ticket work and how they change word lists and model actions.
Rule-based and method stemmersDictionary-based lemmatizersTool choices and performanceEffect on word lists and emptinessTask-based method choiceLesson 5Handling spelling mistakes, abbreviations, and domain-specific shorthand (spell correction, lookup dictionaries)Find ways to fix spelling, grow short forms, and make normal special short words in tickets. You will mix spell fixing, look-up word lists, and custom rules without harming key names and codes.
Common error types in support textDictionary and edit-distance fixingCustom area short form word listsContext-aware fixing strategiesProtecting names and codesLesson 6Stopword removal tradeoffs and configurable stopword lists for support ticket domainsLook at gains and losses of removing common words in support areas. You will make changeable word lists, check their effect on models, and handle special area words that show hidden intentions.
Standard vs area stopword listsEffect on word bag featuresEffect on embeddings and transformersChangeable and layered stopword setsChecking removal with testsLesson 7Text normalization fundamentals: lowercasing, Unicode normalization, whitespace and linebreak handlingCover main text making normal steps like small letters, Unicode making normal, and space cleaning. We talk about order of steps, language special points, and keeping important format signs.
Small lettering and case keeping rulesUnicode normal formsHandling accents and special symbolsSpace and linebreak cleaningOrdering normal operationsLesson 8Data splitting strategies: time-based splits, stratified sampling by topic/sentiment, and nested cross-validation considerationsStudy data splitting ways made for time and labeled ticket data. We compare time splits, layered sampling by topic or feeling, and nested cross-check for strong model checking.
Holdout, k-fold, and time splitsLayering by topic and feelingStopping time data leakNested cross-check workMatching splits to business aimsLesson 9Handling URLs, email addresses, code snippets, and identifiers in text (masking vs preserving)Learn ways to handle URLs, emails, code parts, and names in text. We compare hiding, making normal, and keeping choices, focusing on privacy, removing doubles, and model work effects.
Finding URLs and email patternsHiding versus normal rulesShowing code parts safelyHandling ticket and user namesPrivacy and leak thoughtsLesson 10Understanding CSV schema and data types (ticket_id, created_at, customer_id, text, channel, resolved, resolution_time_hours, manual_topic, manual_sentiment)Learn to read CSV setups for ticket data sets and give right data types. We cover reading names, times, true/false, and text fields, plus checks to stop small later errors.
Checking headers and sample rowsGiving strong column data typesChecking times and IDsFinding bad or mixed typesSetup checking in pipelinesLesson 11Techniques to detect and quantify missing values and label noise (missingness patterns, label consistency checks, inter-annotator metrics)Learn to find missing values and noisy labels in support ticket data. We cover missing patterns, label same-checks, and marker agreement numbers to measure label quality and guide cleaning choices.
Types of missing in ticket dataShowing missing patternsFinding different labelsMarker agreement numbersRules to mark label noiseLesson 12Creating reproducible pipelines and versioning cleaned datasets (data contracts, hashing)Learn to make repeatable pre-work pipelines and versioned cleaned data sets. We cover step-by-step design, setup handling, hashing, and data deals that keep models, code, and data in line over time.
Designing step pre-work stepsSetup and parameter trackingHashing raw and worked dataData deals and setup promisesLogging and check trails for changesLesson 13Date/time parsing and timezone handling, deriving temporal features (daypart, weekday, recency)Understand how to read mixed date and time fields, handle time areas, and make time features. We focus on strong reading, normal to main time, and made features like newness and seasons.
Reading mixed date formatsTime area normal strategiesHandling missing or bad timesMaking newness and age featuresDay part, weekday, and seasonsLesson 14Imputation and treatment of non-text columns (resolved, resolution_time_hours, channel) for modelingFind filling and pre-work for non-text columns like solve status, solve time, and channel. We discuss coding strategies, leak risks, and how to match these features with text for modeling.
Profiling non-text ticket columnsFilling for number timesCoding group status fieldsAvoiding target leak in featuresJoint modeling with text signs