Lesson 1Interpreting logs for web apps and system services: /var/log/syslog, /var/log/messages, journald (journalctl), application-specific logs and how to filter themLearn to read and filter Linux and macOS logs for web apps and services. You will work with syslog, journald, and app logs, using journalctl, grep, and other tools to isolate issues, correlate events, and build repeatable log queries.
Syslog layout and common log locationsUsing journalctl filters and time rangesReading web server access and error logsFiltering logs with grep, awk, and sedCorrelating multi-service events by timestampLesson 2Process identification and analysis: ps aux, pstree, pmap, lsof, strace — finding the offending process and inspecting behaviourDevelop techniques to find and inspect problematic processes. Using ps, pstree, pmap, lsof, and strace, you will map process hierarchies, examine open files and ports, trace system calls, and link resource spikes to specific PIDs.
Locating heavy processes with ps and topVisualizing parents and children with pstreeInspecting memory maps using pmapFinding open files and ports with lsofTracing system calls and hangs with straceLesson 3Network diagnostics on Linux: ss, netstat, ip a, ip route, ethtool, ifconfig — how to read interface and route informationGain skills to diagnose Linux and macOS network issues. You will use ss, netstat, ip, ifconfig, and ethtool to inspect sockets, routes, and interfaces, identify listening services, and verify connectivity and throughput problems.
Listing listening ports with ss and netstatInspecting IP addresses and routes with ipChecking link speed and duplex with ethtoolUsing ifconfig and ip for interface statusDetecting common routing and DNS issuesLesson 4Disk I/O investigation: iostat, iotop, blktrace, checking filesystem types and mount optionsInvestigate disk I/O bottlenecks and filesystem behaviour. You will use iostat, iotop, and blktrace, inspect filesystem types and mount options, and relate I/O patterns to application workloads and latency symptoms.
Spotting I/O saturation with iostatFinding I/O-heavy processes using iotopTracing block-level activity with blktraceComparing filesystem types and tradeoffsReviewing mount options for performanceLesson 5Commands for live monitoring: top, htop, vmstat, iostat, sar, mpstat — exact usage examples and interpretationExplore live monitoring tools to understand real-time system behaviour. You will use top, htop, vmstat, iostat, sar, and mpstat to spot CPU, memory, and I/O bottlenecks, interpret key fields, and capture short performance snapshots.
Reading CPU and load in top and htopUsing vmstat for memory and swap insightMonitoring disk I/O with iostat and iotopHistorical snapshots with sar and mpstatExporting command output for later reviewLesson 6Analysing web server performance: nginx/apache status modules, access/error logs, slow request analysis, HTTP status patternsLearn to evaluate web server performance on nginx and Apache. You will read status modules, analyse access and error logs, detect slow requests, and interpret HTTP status patterns to distinguish client issues from server bottlenecks.
Enabling and reading nginx status endpointsUsing Apache mod_status and server-statusIdentifying slow requests and timeoutsAnalyzing HTTP status code distributionsDetecting bots, scans, and abusive trafficLesson 7Long-term remediation: capacity planning, resource limits (systemd, cgroups), tuning kernel and web server configs, application profiling tools and when to use themPlan long-term fixes instead of repeated firefighting. You will practice capacity planning, set resource limits with systemd and cgroups, tune kernel and web server parameters, and choose profiling tools to guide code and config changes.
Collecting data for capacity planningConfiguring systemd unit resource limitsApplying basic cgroup constraints safelyTuning kernel and web server parametersSelecting and using app profiling toolsLesson 8Understanding system resource metrics: CPU, memory, I/O, network — what to monitor and whyUnderstand core system metrics and what they reveal about health. You will interpret CPU, memory, disk, and network indicators, learn safe thresholds, and decide which metrics matter most for web workloads and background services.
CPU utilization, load average, and run queuesMemory usage, cache, and swap behaviorDisk throughput, latency, and queue depthNetwork bandwidth, errors, and dropsChoosing alert thresholds for key metricsLesson 9Temporary mitigation techniques: restarting services, adjusting process niceness, freeing caches, taking services offline gracefully — commands and expected outcomesApply safe, temporary mitigations during incidents. You will restart services, adjust niceness, manage caches, and gracefully take services offline, understanding commands, risks, and how to verify that mitigations are effective.
Safely restarting critical servicesAdjusting process priority with nice and reniceFreeing page cache and dentries carefullyPutting web apps into maintenance modeVerifying mitigation impact on metrics