Lesson 1Memory analysis: free -m, /proc/meminfo, slabtop, smem—interpreting used vs available memory and swap behaviourHere you will examine memory usage with free, /proc/meminfo, slabtop, and smem. The section covers Linux caching, buffers, and reclaim processes, how to read swap usage, and spot memory leaks, fragmentation, and wrongly set limits.
Reading free -m and understanding cached memoryKey fields in /proc/meminfo for diagnosisUsing slabtop to inspect kernel slab usageUsing smem to attribute memory per processRecognizing swap thrashing and OOM risksLesson 2Network usage and bottlenecks: iftop, nload, ss, netstat, ip -s link, tc, tcpdump—identifying network saturation and problematic connectionsThis section deals with diagnosing network usage and bottlenecks using iftop, nload, ss, ip, tc, and tcpdump. You will learn to spot saturation, noisy neighbours, connection states, and packet issues causing slow applications.
Monitoring live bandwidth with iftop and nloadInspecting sockets and states with ssUsing ip -s link to view interface errorsBasics of tc for shaping and rate limitingTargeted packet capture with tcpdumpLesson 3Storage latency and deeper I/O: blktrace, bpftrace (basic scripts), fio for tests—how to measure and interpret latency and throughputThis section covers storage latency and in-depth I/O analysis with blktrace, simple bpftrace scripts, and fio benchmarks. You will learn to measure latency and throughput, read queue depth, and separate device limits from workload problems.
Understanding latency, IOPS, and throughputUsing blktrace to inspect block I/O patternsIntroductory bpftrace scripts for disk latencyDesigning fio workloads that mimic productionReading fio reports and spotting bottlenecksLesson 4Process investigation: ps, top/htop filters, pgrep, pidstat, nice/renice—how to find CPU- and memory-heavy processesYou will learn to probe processes using ps, top or htop filters, pgrep, pidstat, and nice or renice. The section shows how to identify CPU and memory intensive tasks, track per-process I/O, and tweak priorities to cut contention.
Listing and filtering processes with psUsing pgrep and pkill safely and preciselyUsing pidstat for per process CPU and I/OFiltering top and htop by user or resourceAdjusting priorities with nice and reniceLesson 5System resource overview: top, htop, vmstat, mpstat, dstat—what each shows and expected output patternsHere you will learn to read system-wide resource views using tools like top, htop, vmstat, mpstat, and dstat. The section stresses understanding CPU, memory, and load metrics, and spotting normal vs abnormal usage patterns.
Key CPU, load, and memory fields in topUsing htop for interactive process analysisvmstat for run queue, swap, and I/O insightmpstat for per-CPU utilization and steal timedstat for combined multi-resource timelinesLesson 6Disk I/O and filesystem checks: iostat, iotop, sar -d, lsblk, df -h, du -sh, tune2fs, xfs_info—detecting I/O bottlenecks and low spaceThis section focuses on disk I/O and filesystem health using iostat, iotop, sar -d, lsblk, df, du, tune2fs, and xfs_info. You will learn to detect saturation, queue pile-up, filesystem errors, and low space that harm performance.
Using iostat to spot busy and slow devicesUsing iotop to find I/O heavy processessar -d for historical disk utilization trendsChecking layout and types with lsblk and dfFinding space hogs with du and inode checksLesson 7System logs and journaling: journalctl (systemd), /var/log/messages, /var/log/syslog, auth logs—what to search for and whyThis section explains using systemd journalctl and traditional log files like /var/log/messages, /var/log/syslog, and auth logs. You will learn key patterns to search, filter out noise, and how logs aid root cause analysis.
journalctl basics and useful filtering optionsReading /var/log/messages and /var/log/syslogFinding errors, warnings, and rate-limited eventsAnalyzing authentication and sudo related logsCorrelating log timestamps with incidentsLesson 8Time-based and historical monitoring: sar, sysstat, collectl—collecting and reading historical metrics to correlate eventsYou will learn to gather and interpret past metrics using sar, sysstat, and collectl. The section covers scheduling data collection, reading time-series reports, and linking performance issues with config changes or deployments.
Enabling and configuring sysstat collectionUsing sar for CPU, memory, and I/O historyReading sar network and load average trendsUsing collectl for multi-resource timelinesCorrelating metrics with change windowsLesson 9Kernel and scheduler insights: dmesg, sysctl -a, /proc/sys/vm parameters—what kernel messages and tunables revealHere you will explore kernel and scheduler details using dmesg, sysctl, and /proc/sys/vm parameters. The section explains how kernel messages, tunables, and scheduler actions reveal hardware faults, wrong configs, and tuning choices.
Reading dmesg for hardware and driver issuesListing and querying sysctl tunable valuesKey /proc/sys/vm parameters for memoryScheduler related kernel parameters overviewSafely persisting kernel tuning changesLesson 10Approach to root cause determination: step-by-step decision tree to classify issues as CPU, RAM, disk I/O, or networkThis section offers a practical decision tree for root cause analysis. You will learn to classify problems as CPU, memory, disk I/O, or network bound, tools for each step, and refine guesses using evidence step by step.
Initial triage and problem statementClassifying CPU versus I/O bound symptomsDistinguishing memory pressure from leaksIdentifying network versus local bottlenecksIterative hypothesis testing with metrics