Lesson 1When to kill, restart, or throttle a process: safe kill practices, systemctl restart, and using cgroups and nice/reniceUnderstand when fi kill, restart, or throttle a process an how fi do it safely. Learn signal types, safe kill patterns, systemctl restart behavior, an how fi apply cgroups an nice or renice fi limit impact.
Choosing SIGTERM, SIGKILL, and othersUsing kill and pkill with safeguardsRestarting services with systemctlThrottling CPU with nice and reniceLimiting resources using cgroupsDocumenting and automating remediesLesson 2Analyzing swap usage and OOM events: dmesg, kernel OOM killer logs, and /var/log/kern.logInvestigate swap usage an Out Of Memory events using free, dmesg, kernel OOM logs, an /var/log/kern.log. Learn fi recognize thrashing, tune swappiness, an decide when fi add RAM or adjust limits.
Checking swap usage with free and /procRecognizing swap thrashing symptomsReading dmesg for OOM killer entriesParsing /var/log/kern.log detailsTuning swappiness and vm overcommitDeciding when to add RAM or adjust limitsLesson 3Identifying hot processes: ps, ps aux --sort, pgrep, pidstat and mapping PIDs to servicesLearn fi quickly identify hot or misbehaving processes using ps, pgrep, pidstat, an sorting options. Map PIDs back fi services, units, an containers fi connect resource usage wid responsible components.
Sorting ps output by CPU and memoryUsing pgrep and pkill name filtersMonitoring per-process stats with pidstatMapping PIDs to systemd unitsRelating PIDs to containers or cgroupsTracking short-lived bursty processesLesson 4Identifying recurring resource spikes: inspecting cron, systemd timers, at jobs, and application schedulersExplore methods fi detect recurring CPU, memory, an I/O spikes by correlating metrics wid scheduled tasks. Inspect cron, systemd timers, at jobs, an in-app schedulers fi find an fix noisy or overlapping jobs.
Listing and reading user and system crontabsInspecting systemd timers and calendar unitsReviewing at jobs and one-off schedulesTracing app-level schedulers and workersCorrelating spikes with job execution timesRefining or staggering noisy recurring jobsLesson 5Memory troubleshooting: free, /proc/meminfo, smem, pmap and checking for memory leaksGain skills fi troubleshoot memory issues using free, /proc/meminfo, smem, an pmap. Learn fi distinguish cache from real pressure, find per-process usage, an recognize patterns weh indicate memory leaks or fragmentation.
Interpreting free and available memoryReading /proc/meminfo key fieldsUsing smem for per-process breakdownsInspecting process maps with pmapSpotting memory leak growth patternsDifferentiating cache from real pressureLesson 6Integrating with monitoring data (Prometheus, Grafana) and using historical metrics to determine trendsLearn fi combine local troubleshooting wid Prometheus an Grafana data. Use historical metrics, dashboards, an alerts fi identify trends, regressions, an slow drifts, an fi validate di impact of performance fixes.
Reviewing key CPU and load dashboardsInspecting memory, cache, and swap panelsAnalyzing disk and network latency graphsUsing PromQL to slice historical metricsCorrelating deploys with metric changesValidating fixes with before and after viewsLesson 7Load vs CPU saturation: uptime, load average interpretation and relation to CPU coresClarify di meaning of system load averages an dem relation fi CPU cores an run queues. Learn fi distinguish healthy high load from CPU saturation, an correlate load wid I/O wait, context switches, an latency.
Reading uptime and load averagesRelating load to CPU core countsSeparating runnable and blocked tasksIdentifying CPU-bound saturation casesRecognizing I/O wait driven loadUsing vmstat and mpstat to confirmLesson 8Collecting live system metrics: top, htop, vmstat, mpstat, iostat and how to interpret outputsLearn fi collect an interpret live Linux performance metrics using top, htop, vmstat, mpstat, an iostat. Understand CPU, memory, an I/O views, key fields, refresh intervals, an how fi spot bottlenecks in real time.
Reading CPU usage in top and htopMonitoring memory and swap in topUsing vmstat for system-wide snapshotsAnalyzing CPU stats with mpstatChecking disk I/O patterns with iostatChoosing sampling intervals and filtersLesson 9Using perf, strace, and ltrace for deep process analysis and when to use eachUnderstand when an how fi use perf, strace, an ltrace fi deep process analysis. Learn fi profile CPU hotspots, trace system calls, inspect library calls, an minimize overhead while capturing actionable diagnostics.
Profiling CPU hotspots with perf recordViewing perf reports and call graphsTracing syscalls with strace safelyFiltering noisy strace outputInspecting library calls using ltraceChoosing the right tool for each symptomLesson 10Using lightweight profiling and tracing tools (py-spy, gdb, flamegraphs) for Python appsFocus pon lightweight profiling an tracing fi Python applications using py-spy, gdb, an flamegraphs. Capture stack samples in production, locate hot code paths, an interpret flamegraphs without stopping services.
Sampling Python stacks with py-spyGenerating and reading flamegraphsAttaching gdb safely to live PythonHandling stripped or optimized buildsProfiling async and multithreaded codeReducing profiler overhead in production