Lesson 1Backup strategies: full vs incremental, logical vs physical DB backups, filesystem vs application backups, retention policiesDesign backup strategies for servers and databases, comparing full and incremental backups, logical and physical methods, and filesystem versus application-level approaches. Define retention, encryption, and verification practices for local needs.
Full versus incremental backup plansLogical versus physical DB backupsFilesystem and application-level backupsBackup encryption and access controlRetention schedules and legal needsLesson 2Centralized logging: syslog vs logstash vs fluentd, log rotation, retention, index strategies, and sizing for an internal appLearn how to centralize logs for an internal app using syslog, Logstash, or Fluentd. We cover log collection, rotation, retention, indexing strategies, and capacity sizing to support troubleshooting and compliance needs in we region.
Choosing log shippers and collectorsDesigning log formats and metadataLog rotation and retention policiesIndexing strategies for fast searchSizing storage and ingestion ratesLesson 3Designing a 4-node architecture: roles and separation (2 web servers, 1 app worker, 1 DB, optional central log/monitor)Design a practical four-node architecture with clear role separation. You will place two web servers, one application worker, and one database node, and consider adding centralized logging and monitoring for observability and resilience in small setups.
Defining roles for each server nodeWeb tier design and reverse proxiesApplication worker patterns and queuesDatabase placement and connectivityAdding shared logging and monitoringLesson 4Network design: private networks, security groups, firewall rules, NAT, and routing between on-prem and cloudLearn how to design secure, resilient networks for hybrid environments. We cover private subnets, security groups, firewall policies, NAT patterns, and routing between on‑premises and cloud to support scalable server deployments for Liberia.
Designing private and public subnetsSecurity groups and firewall rule designNAT gateways and outbound internet accessRouting between on-prem and cloud VPCsNetwork segmentation for app tiersLesson 5Patch management: OS package lifecycle, configuration management tools (Ansible, Puppet, Salt), scheduled windows and rollback plansPlan and operate patch management for operating systems and applications. Learn package lifecycles, using Ansible, Puppet, or Salt, scheduling maintenance windows, testing updates, and preparing rollback and communication plans for safe operations.
OS and package lifecycle conceptsUsing Ansible, Puppet, or SaltPatch testing and staging environmentsScheduling maintenance windowsRollback strategies and communicationLesson 6Access control and authentication: SSH key management, bastion host patterns, jumpboxes, VPN placement, MFA considerationsDesign secure access control for servers using SSH keys, bastion hosts, and VPNs. Learn key lifecycle management, jumpbox patterns, MFA integration, and logging of administrative access for compliance and incident response in we context.
SSH key generation and rotationBastion hosts and jumpbox patternsVPN placement and traffic flowsIntegrating MFA for admin accessAuditing and logging remote sessionsLesson 7Restore testing: recovery drills, point-in-time restore for databases, RTO/RPO concepts and how to validate restoresUnderstand how to prove backups are usable through structured restore testing. You will practice recovery drills, database point-in-time restores, and validation steps aligned with RTO and RPO targets for critical internal services here.
Defining RTO and RPO objectivesPlanning and running recovery drillsTesting database point-in-time restoresValidating application-level restoresDocumenting and reviewing test resultsLesson 8Runbooks and operational playbooks: creating and storing runbooks, change management, runbook examples for common tasksCreate effective runbooks and operational playbooks for routine and emergency tasks. Learn structure, storage, and change control, and review concrete examples for deployments, restarts, incident triage, and rollback steps in practical way.
Runbook structure and required detailsVersioning and storing runbooksChange management and approvalsRunbooks for common maintenance tasksIncident response and escalation playbooksLesson 9Basic monitoring and alerting architecture: metrics, logs, traces; choosing a monitoring stack (Prometheus, Grafana, Alertmanager, Nagios, Zabbix)Build a basic monitoring and alerting architecture using metrics, logs, and traces. Compare Prometheus, Grafana, Alertmanager, Nagios, and Zabbix, and design alert rules, dashboards, and escalation paths for internal services in Liberia.
Key metrics, logs, and tracing signalsSelecting a monitoring tool stackDesigning dashboards for operatorsAlert rules, thresholds, and noise controlEscalation policies and on-call flowsLesson 10High-availability and redundancy tradeoffs for a mid-size internal app (load balancing, sticky sessions, session stores)Explore high-availability patterns and redundancy tradeoffs for a mid-size internal app. Learn load balancing options, handling sticky sessions, external session stores, and failure scenarios to balance cost, complexity, and uptime for we apps.
Identifying availability requirementsLoad balancer types and health checksSticky sessions versus stateless designExternal session stores and cachingFailure modes and graceful degradation