Lesson 1When to Kill, Restart, or Throttle a Process: Safe Kill Practices, systemctl Restart, and Usin cgroups and nice/reniceUnderstand when to kill, restart, or throttle a process and how to do am safely. Learn signal types, safe kill patterns, systemctl restart behavior, and how to apply cgroups and nice or renice to limit impact.
Choosin SIGTERM, SIGKILL, and othersUsin kill and pkill with safeguardsRestartin services with systemctlThrottlin CPU with nice and reniceLimitin resources usin cgroupsDocumentin and automatin remediesLesson 2Analyzin Swap Usage and OOM Events: dmesg, Kernel OOM Killer Logs, and /var/log/kern.logInvestigate swap usage and Out Of Memory events usin free, dmesg, kernel OOM logs, and /var/log/kern.log. Learn to recognize thrashin, tune swappiness, and decide when to add RAM or adjust limits.
Checkin swap usage with free and /procRecognizin swap thrashin symptomsReadin dmesg for OOM killer entriesParsin /var/log/kern.log detailsTunin swappiness and vm overcommitDecidin when to add RAM or adjust limitsLesson 3Identifin Hot Processes: ps, ps aux --sort, pgrep, pidstat and Mappin PIDs to ServicesLearn to quickly identify hot or misbehavin processes usin ps, pgrep, pidstat, and sortin options. Map PIDs back to services, units, and containers to connect resource usage with responsible components.
Sortin ps output by CPU and memoryUsin pgrep and pkill name filtersMonitorin per-process stats with pidstatMappin PIDs to systemd unitsRelatin PIDs to containers or cgroupsTrackin short-lived bursty processesLesson 4Identifin Recurrin Resource Spikes: Inspectin cron, systemd Timers, at Jobs, and Application SchedulersExplore methods to detect recurrin CPU, memory, and I/O spikes by correlatin metrics with scheduled tasks. Inspect cron, systemd timers, at jobs, and in-app schedulers to find and fix noisy or overlappin jobs.
Listin and readin user and system crontabsInspectin systemd timers and calendar unitsReviewin at jobs and one-off schedulesTracin app-level schedulers and workersCorrelatin spikes with job execution timesRefinin or staggarin noisy recurrin jobsLesson 5Memory Troubleshootin: free, /proc/meminfo, smem, pmap and Checkin for Memory LeaksGain skills to troubleshoot memory issues usin free, /proc/meminfo, smem, and pmap. Learn to distinguish cache from real pressure, find per-process usage, and recognize patterns wey indicate memory leaks or fragmentation.
Interpretin free and available memoryReadin /proc/meminfo key fieldsUsin smem for per-process breakdownsInspectin process maps with pmapSpottin memory leak growth patternsDifferentiatin cache from real pressureLesson 6Integratin with Monitorin Data (Prometheus, Grafana) and Usin Historical Metrics to Determine TrendsLearn to combine local troubleshootin with Prometheus and Grafana data. Use historical metrics, dashboards, and alerts to identify trends, regressions, and slow drifts, and to validate impact of performance fixes.
Reviewin key CPU and load dashboardsInspectin memory, cache, and swap panelsAnalyzin disk and network latency graphsUsin PromQL to slice historical metricsCorrelatin deploys with metric changesValidatin fixes with before and after viewsLesson 7Load vs CPU Saturation: uptime, Load Average Interpretation and Relation to CPU CoresClarify meanin of system load averages and dem relation to CPU cores and run queues. Learn to distinguish healthy high load from CPU saturation, and correlate load with I/O wait, context switches, and latency.
Readin uptime and load averagesRelatin load to CPU core countsSeparatin runnable and blocked tasksIdentifin CPU-bound saturation casesRecognizin I/O wait driven loadUsin vmstat and mpstat to confirmLesson 8Collectin Live System Metrics: top, htop, vmstat, mpstat, iostat and How to Interpret OutputsLearn to collect and interpret live Linux performance metrics usin top, htop, vmstat, mpstat, and iostat. Understand CPU, memory, and I/O views, key fields, refresh intervals, and how to spot bottlenecks in real time.
Readin CPU usage in top and htopMonitorin memory and swap in topUsin vmstat for system-wide snapshotsAnalyzin CPU stats with mpstatCheckin disk I/O patterns with iostatChoosin samplin intervals and filtersLesson 9Usin perf, strace, and ltrace for Deep Process Analysis and When to Use EachUnderstand when and how to use perf, strace, and ltrace for deep process analysis. Learn to profile CPU hotspots, trace system calls, inspect library calls, and minimize overhead while capturin actionable diagnostics.
Profilin CPU hotspots with perf recordViewin perf reports and call graphsTracin syscalls with strace safelyFilterin noisy strace outputInspectin library calls usin ltraceChoosin right tool for each symptomLesson 10Usin Lightweight Profilin and Tracin Tools (py-spy, gdb, flamegraphs) for Python AppsFocus on lightweight profilin and tracin for Python applications usin py-spy, gdb, and flamegraphs. Capture stack samples in production, locate hot code paths, and interpret flamegraphs without stoppin services.
Samplin Python stacks with py-spyGeneratin and readin flamegraphsAttachin gdb safely to live PythonHandlin stripped or optimized buildsProfilin async and multithreaded codeReducin profiler overhead in production