Lesson 1Memory checking: free -m, /proc/meminfo, slabtop, smem—understanding used vs available memory and swap behaviourHere you will check memory behaviour using free, /proc/meminfo, slabtop, and smem. The section explains Linux caching, buffers, and reclaim, how to understand swap use, and how to spot memory leaks, fragmentation, and wrong limits.
Reading free -m and understanding cached memoryKey fields in /proc/meminfo for diagnosisUsing slabtop to check kernel slab useUsing smem to assign memory per processRecognizing swap thrashing and OOM risksLesson 2Network use and blocks: iftop, nload, ss, netstat, ip -s link, tc, tcpdump—finding network overload and bad connectionsThis section covers finding network use and blocks using iftop, nload, ss, ip, tc, and tcpdump. You will learn to spot overload, noisy neighbours, connection states, and packet problems that make applications slow.
Monitoring live bandwidth with iftop and nloadChecking sockets and states with ssUsing ip -s link to see interface errorsBasics of tc for shaping and rate limitingTargeted packet capture with tcpdumpLesson 3Storage delay and deeper I/O: blktrace, bpftrace (simple scripts), fio for tests—how to measure and understand delay and throughputThis section covers storage delay and deeper I/O checking using blktrace, simple bpftrace scripts, and fio tests. You will learn how to measure delay and throughput, understand queue depth, and tell device limits from workload problems.
Understanding delay, IOPS, and throughputUsing blktrace to check block I/O patternsSimple bpftrace scripts for disk delayDesigning fio workloads like productionReading fio reports and spotting blocksLesson 4Process checking: ps, top/htop filters, pgrep, pidstat, nice/renice—how to find CPU- and memory-heavy processesYou will learn to check processes with ps, top or htop filters, pgrep, pidstat, and nice or renice. The section shows how to spot CPU and memory heavy tasks, track per process I/O, and adjust priorities to reduce fights.
Listing and filtering processes with psUsing pgrep and pkill safely and preciselyUsing pidstat for per process CPU and I/OFiltering top and htop by user or resourceAdjusting priorities with nice and reniceLesson 5System resource overview: top, htop, vmstat, mpstat, dstat—what each shows and expected output patternsHere you will learn to read system-wide resource snapshots using tools like top, htop, vmstat, mpstat, and dstat. The section focuses on understanding CPU, memory, and load metrics, and recognizing normal versus bad usage patterns.
Key CPU, load, and memory fields in topUsing htop for interactive process analysisvmstat for run queue, swap, and I/O insightmpstat for per-CPU use and steal timedstat for combined multi-resource timelinesLesson 6Disk I/O and filesystem checks: iostat, iotop, sar -d, lsblk, df -h, du -sh, tune2fs, xfs_info—detecting I/O blocks and low spaceThis section focuses on disk I/O and filesystem health using iostat, iotop, sar -d, lsblk, df, du, tune2fs, and xfs_info. You will learn to spot overload, queue buildup, filesystem errors, and low space that make performance worse.
Using iostat to spot busy and slow devicesUsing iotop to find I/O heavy processessar -d for past disk use trendsChecking layout and types with lsblk and dfFinding space hogs with du and inode checksLesson 7System logs and journaling: journalctl (systemd), /var/log/messages, /var/log/syslog, auth logs—what to search for and whyThis section explains how to use systemd journalctl and old log files like /var/log/messages, /var/log/syslog, and auth logs. You will learn what patterns to search for, how to filter noise, and how logs help find root causes.
journalctl basics and useful filtering optionsReading /var/log/messages and /var/log/syslogFinding errors, warnings, and rate-limited eventsAnalysing auth and sudo related logsConnecting log times with incidentsLesson 8Time-based and past monitoring: sar, sysstat, collectl—collecting and reading past metrics to connect eventsYou will learn how to collect and understand past metrics using sar, sysstat, and collectl. The section explains how to schedule data collection, read time series reports, and connect performance problems with config changes or deployments.
Enabling and setting sysstat collectionUsing sar for CPU, memory, and I/O historyReading sar network and load average trendsUsing collectl for multi-resource timelinesConnecting metrics with change windowsLesson 9Kernel and scheduler insights: dmesg, sysctl -a, /proc/sys/vm parameters—what kernel messages and tunables showHere you will explore kernel and scheduler insights using dmesg, sysctl, and /proc/sys/vm parameters. The section explains how kernel messages, tunables, and scheduler behaviour show hardware problems, wrong configs, and tuning options.
Reading dmesg for hardware and driver issuesListing and querying sysctl tunable valuesKey /proc/sys/vm parameters for memoryScheduler related kernel parameters overviewSafely saving kernel tuning changesLesson 10Approach to root cause finding: step-by-step decision tree to classify problems as CPU, RAM, disk I/O, or networkThis section presents a practical decision tree for root cause analysis. You will learn how to classify incidents as CPU, memory, disk I/O, or network bound, which tools to run in each branch, and how to refine guesses using collected evidence.
Initial triage and problem statementClassifying CPU versus I/O bound symptomsDistinguishing memory pressure from leaksIdentifying network versus local blocksIterative guess testing with metrics