Lesson 1When to Kill, Restart, or Slow a Process: Safe Kill Ways, systemctl Restart, and Using cgroups and nice/reniceUnderstand when to kill, restart, or slow a process and how to do it without harm. Learn signal types, safe kill ways, systemctl restart actions, and how to use cgroups and nice or renice to limit effects.
Choosing SIGTERM, SIGKILL, and othersUsing kill and pkill with safeguardsRestarting services with systemctlThrottling CPU with nice and reniceLimiting resources using cgroupsDocumenting and automating remediesLesson 2Looking at Swap Use and OOM Events: dmesg, Kernel OOM Killer Logs, and /var/log/kern.logLook into swap use and Out Of Memory events using free, dmesg, kernel OOM logs, and /var/log/kern.log. Learn to spot thrashing, tune swappiness, and choose when to add RAM or change limits.
Checking swap usage with free and /procRecognizing swap thrashing symptomsReading dmesg for OOM killer entriesParsing /var/log/kern.log detailsTuning swappiness and vm overcommitDeciding when to add RAM or adjust limitsLesson 3Finding Hot Processes: ps, ps aux --sort, pgrep, pidstat and Linking PIDs to ServicesLearn to quickly find hot or bad processes using ps, pgrep, pidstat, and sort options. Link PIDs back to services, units, and containers to connect resource use with what is causing it.
Sorting ps output by CPU and memoryUsing pgrep and pkill name filtersMonitoring per-process stats with pidstatMapping PIDs to systemd unitsRelating PIDs to containers or cgroupsTracking short-lived bursty processesLesson 4Finding Recurring Resource Spikes: Checking cron, systemd Timers, at Jobs, and App SchedulersLook at ways to spot repeating CPU, memory, and I/O spikes by linking measures with planned tasks. Check cron, systemd timers, at jobs, and app schedulers to find and fix noisy or overlapping jobs.
Listing and reading user and system crontabsInspecting systemd timers and calendar unitsReviewing at jobs and one-off schedulesTracing app-level schedulers and workersCorrelating spikes with job execution timesRefining or staggering noisy recurring jobsLesson 5Memory Troubleshooting: free, /proc/meminfo, smem, pmap and Checking for Memory LeaksGet skills to fix memory problems using free, /proc/meminfo, smem, and pmap. Learn to tell cache from real pressure, find per-process use, and spot patterns that show memory leaks or breaks.
Interpreting free and available memoryReading /proc/meminfo key fieldsUsing smem for per-process breakdownsInspecting process maps with pmapSpotting memory leak growth patternsDifferentiating cache from real pressureLesson 6Linking with Monitoring Data (Prometheus, Grafana) and Using Past Measures to See TrendsLearn to mix local fixing with Prometheus and Grafana data. Use past measures, dashboards, and alerts to spot trends, bad changes, and slow shifts, and check the effect of performance fixes.
Reviewing key CPU and load dashboardsInspecting memory, cache, and swap panelsAnalyzing disk and network latency graphsUsing PromQL to slice historical metricsCorrelating deploys with metric changesValidating fixes with before and after viewsLesson 7Load vs CPU Full: uptime, Load Average Meaning and Link to CPU CoresMake clear what system load averages mean and how they link to CPU cores and run queues. Learn to tell good high load from CPU full, and link load with I/O wait, switches, and delay.
Reading uptime and load averagesRelating load to CPU core countsSeparating runnable and blocked tasksIdentifying CPU-bound saturation casesRecognizing I/O wait driven loadUsing vmstat and mpstat to confirmLesson 8Collecting Live System Measures: top, htop, vmstat, mpstat, iostat and How to Understand OutputsLearn to collect and understand live Linux performance measures using top, htop, vmstat, mpstat, and iostat. Get CPU, memory, and I/O views, key parts, refresh times, and spot blocks in real time.
Reading CPU usage in top and htopMonitoring memory and swap in topUsing vmstat for system-wide snapshotsAnalyzing CPU stats with mpstatChecking disk I/O patterns with iostatChoosing sampling intervals and filtersLesson 9Using perf, strace, and ltrace for Deep Process Look and When to Use EachUnderstand when and how to use perf, strace, and ltrace for deep process look. Learn to profile CPU hot spots, trace system calls, check library calls, and cut overhead while getting useful info.
Profiling CPU hotspots with perf recordViewing perf reports and call graphsTracing syscalls with strace safelyFiltering noisy strace outputInspecting library calls using ltraceChoosing the right tool for each symptomLesson 10Using Light Profiling and Tracing Tools (py-spy, gdb, flamegraphs) for Python AppsFocus on light profiling and tracing for Python apps using py-spy, gdb, and flamegraphs. Get stack samples in live, find hot code paths, and read flamegraphs without stopping services.
Sampling Python stacks with py-spyGenerating and reading flamegraphsAttaching gdb safely to live PythonHandling stripped or optimized buildsProfiling async and multithreaded codeReducing profiler overhead in production