Lesson 1Memory checking: free -m, /proc/meminfo, slabtop, smem—understanding used vs available memory and swap behaviourHere you go analyse memory behaviour using free, /proc/meminfo, slabtop, and smem. The part explains Linux caching, buffers, and reclaim, how to read swap usage, and how to spot memory leaks, fragmentation, and wrong set limits.
Reading free -m and getting cached memoryMain fields in /proc/meminfo for checkingUsing slabtop to look at kernel slab usageUsing smem to share memory per processSpotting swap thrashing and OOM risksLesson 2Network usage and blocks: iftop, nload, ss, netstat, ip -s link, tc, tcpdump—finding network full and bad connectionsThis part covers finding network usage and blocks using iftop, nload, ss, ip, tc, and tcpdump. You go learn to spot full network, noisy neighbours, connection states, and packet problems that make applications slow.
Watching live bandwidth with iftop and nloadChecking sockets and states with ssUsing ip -s link to see interface errorsBasics of tc for shaping and rate limitingTargeted packet catch with tcpdumpLesson 3Storage delay and deep I/O: blktrace, bpftrace (simple scripts), fio for tests—how to measure and read delay and throughputThis part covers storage delay and deep I/O analysis using blktrace, simple bpftrace scripts, and fio tests. You go learn how to measure delay and throughput, read queue depth, and tell device limits from workload problems.
Understanding delay, IOPS, and throughputUsing blktrace to check block I/O patternsIntro bpftrace scripts for disk delayMaking fio workloads like productionReading fio reports and spotting blocksLesson 4Process checking: ps, top/htop filters, pgrep, pidstat, nice/renice—how to find CPU- and memory-heavy processesYou go learn to check processes with ps, top or htop filters, pgrep, pidstat, and nice or renice. The part shows how to spot CPU and memory heavy tasks, track per process I/O, and adjust priorities to cut contention.
Listing and filtering processes with psUsing pgrep and pkill safe and exactUsing pidstat for per process CPU and I/OFiltering top and htop by user or resourceAdjusting priorities with nice and reniceLesson 5System resource overview: top, htop, vmstat, mpstat, dstat—what each shows and normal output patternsHere you go learn to read system-wide resource snapshots using tools like top, htop, vmstat, mpstat, and dstat. The part focuses on understanding CPU, memory, and load metrics, and knowing normal vs bad usage patterns.
Main CPU, load, and memory fields in topUsing htop for interactive process analysisvmstat for run queue, swap, and I/O insightmpstat for per-CPU use and steal timedstat for combined multi-resource timelinesLesson 6Disk I/O and filesystem checks: iostat, iotop, sar -d, lsblk, df -h, du -sh, tune2fs, xfs_info—spotting I/O blocks and low spaceThis part focuses on disk I/O and filesystem health using iostat, iotop, sar -d, lsblk, df, du, tune2fs, and xfs_info. You go learn to spot full, queue build-up, filesystem errors, and low space that make performance bad.
Using iostat to spot busy and slow devicesUsing iotop to find I/O heavy processessar -d for past disk use trendsChecking layout and types with lsblk and dfFinding space hogs with du and inode checksLesson 7System logs and journaling: journalctl (systemd), /var/log/messages, /var/log/syslog, auth logs—what to search and whyThis part explains how to use systemd journalctl and old log files like /var/log/messages, /var/log/syslog, and auth logs. You go learn what patterns to search, how to filter noise, and how logs help find main causes.
journalctl basics and good filtering optionsReading /var/log/messages and /var/log/syslogFinding errors, warnings, and rate-limited eventsAnalysing auth and sudo related logsConnecting log times with incidentsLesson 8Time-based and past monitoring: sar, sysstat, collectl—collecting and reading past metrics to connect eventsYou go learn how to collect and read past metrics using sar, sysstat, and collectl. The part explains how to set data collection, read time series reports, and connect performance problems with config changes or deployments.
Setting and configuring sysstat collectionUsing sar for CPU, memory, and I/O historyReading sar network and load average trendsUsing collectl for multi-resource timelinesConnecting metrics with change windowsLesson 9Kernel and scheduler insights: dmesg, sysctl -a, /proc/sys/vm parameters—what kernel messages and tunables showHere you go explore kernel and scheduler insights using dmesg, sysctl, and /proc/sys/vm parameters. The part explains how kernel messages, tunables, and scheduler behaviour show hardware problems, wrong configs, and tuning options.
Reading dmesg for hardware and driver issuesListing and querying sysctl tunable valuesMain /proc/sys/vm parameters for memoryScheduler related kernel parameters overviewSafely saving kernel tuning changesLesson 10Approach to finding main cause: step-by-step decision tree to class issues as CPU, RAM, disk I/O, or networkThis part shows a practical decision tree for finding main causes. You go learn how to class incidents as CPU, memory, disk I/O, or network bound, which tools to run in each part, and how to refine ideas using collected evidence.
Initial triage and problem statementClassing CPU vs I/O bound symptomsTelling memory pressure from leaksSpotting network vs local blocksStep-by-step idea testing with metrics