Lesson 1Asynchronous and batched I/O: using io_uring, liburing, AIO, and batching writes to reduce blockingLearn to reduce blocking by issuing asynchronous and batched I/O. We cover io_uring, liburing, POSIX AIO, batching writes, and how to redesign call paths to keep threads busy while I/O completes in the background.
Designing non-blocking I/O call patternsUsing io_uring and liburing submission queuesBatching small writes into larger segmentsHandling completion events and error pathsComparing io_uring with legacy AIO APIsLesson 2Queue depth and client-side throttling: limiting outstanding requests, token buckets, and backpressure mechanismsUnderstand how queue depth and client throttling shape latency and throughput. This section covers safe limits, token bucket design, backpressure signals, and how to avoid overload collapse in shared storage systems.
Choosing safe queue_depth for each device typeImplementing token bucket rate limitersDesigning backpressure signals to callersCoordinating limits across many clientsMonitoring tail latency under throttlingLesson 3I/O scheduler and block layer tuning: switching schedulers, elevator tuning, setting appropriate queue_depth, blk-mq and multiqueue settingsTune the block layer and I/O scheduler to match your workload. You will learn how to select schedulers, adjust queue_depth, configure blk-mq multiqueue, and validate improvements with realistic benchmarks.
Comparing mq-deadline, none, and BFQ schedulersSetting queue_depth for SSDs and HDDsConfiguring blk-mq multiqueue parametersIsolating noisy neighbors at the block layerBenchmarking scheduler changes safelyLesson 4Use of dedicated SSDs vs HDDs and understanding alignment, overprovisioning, and TRIM/reclaim behaviourUnderstand when to use SSDs versus HDDs and how to deploy them correctly. Topics include alignment, overprovisioning, TRIM and reclaim behaviour, and mixed-tier designs for cost-efficient performance.
Choosing SSD or HDD for each workload typeEnsuring partition and filesystem alignmentPlanning SSD overprovisioning capacityConfiguring TRIM, discard, and reclaim safelyDesigning hybrid SSD–HDD storage tiersLesson 5Caching and buffering strategies: application-level caches (LRU), Linux page cache tuning, use of tmpfs, and write-back vs write-through considerationsUnderstand caching and buffering strategies from application to kernel. This section explains LRU caches, Linux page cache tuning, tmpfs usage, and trade-offs between write-back and write-through for durability and latency.
Designing effective application LRU cachesTuning Linux page cache and dirty ratiosWhen to use tmpfs for transient hot dataWrite-back vs write-through trade-offsAvoiding double caching across layersLesson 6Storage layout changes: separate hot data, cold data, and logs onto different devices or partitionsLearn how to separate hot, cold, and log data across devices or partitions. We cover workload analysis, layout patterns, migration strategies, and how to measure latency, throughput, and contention improvements after changes.
Identifying hot vs cold data from workload tracesPlacing logs on low-latency dedicated devicesSeparating random and sequential I/O workloadsPartitioning schemes for mixed media arraysMeasuring gains from layout reorganizationLesson 7Concurrency control and I/O patterns: batching small writes, coalescing fsyncs, group commit, and non-blocking design patternsLearn how concurrency patterns interact with storage behaviour. We cover batching small writes, coalescing fsyncs, group commit, and non-blocking design patterns that reduce contention and improve throughput.
Batching tiny writes into aligned blocksCoalescing fsync calls across sessionsImplementing group commit in log systemsEvent-driven non-blocking server designsAvoiding thundering herd on shared filesLesson 8Reliability trade-offs: data loss risks with async/disabled barriers, cache consistency with NAS, and testing durability guaranteesAnalyze reliability trade-offs introduced by aggressive IO tuning. You will learn risks of async barriers, cache inconsistency with NAS, how to test durability guarantees, and how to document supported failure modes.
Risks of disabling barriers and write cachesNAS cache consistency and stale readsDesigning durability and crash testsDocumenting supported data loss scenariosBalancing SLAs between latency and safetyLesson 9Network storage optimizations: mount options for NFS (async/sync, rsize/wsize, noac, actimeo), TCP tuning and jumbo frames, and multipathingOptimize networked storage stacks such as NFS and iSCSI. This section covers NFS mount options, TCP tuning, jumbo frames, and multipathing to improve throughput, latency, and resilience to link failures.
Choosing NFS async, sync, and commit modesTuning rsize, wsize, and attribute cachingConfiguring TCP buffers and congestion controlUsing jumbo frames safely in storage networksDesigning multipath and failover policiesLesson 10Tuning filesystem and mount options: noatime, nodiratime, barrier/discard, inode allocation and journaling settingsExplore key filesystem and mount options that affect IO latency. You will learn when to use noatime, barrier, discard, and journaling modes, plus how inode allocation and directory options influence metadata overhead.
Impact of atime, noatime, and relatime modesJournaling modes and barrier configurationSafe use of discard and background TRIMInode density and directory layout choicesPer-mount options for latency-sensitive paths