Lesson 1Differential expression analysis: DESeq2, edgeR, limma-voom — model design, contrasts, and multiple-testing correctionThis section details differential expression workflows using DESeq2, edgeR, and limma-voom, focusing on model design, contrasts, dispersion estimation, and multiple-testing correction to obtain reliable gene lists and effect size estimates.
Designing experimental models and covariatesSetting contrasts for complex comparisonsRunning DESeq2 end-to-end workflowUsing edgeR and limma-voom pipelinesMultiple-testing correction and FDR controlInterpreting log2 fold changes and shrinkageLesson 2Data organisation and file naming conventions: sample sheets, raw/processed separation, consistent identifiersThis section describes best practices for organising RNA-seq project files, including sample sheets, directory layouts, raw versus processed data separation, and consistent identifiers that simplify scripting, tracking, and reproducibility in Zimbabwe.
Designing a clear directory hierarchySeparating raw and processed dataCreating robust sample sheets and metadataConsistent sample and library identifiersVersioning reference genomes and indicesBacking up and archiving project dataLesson 3Gene-level quantification strategies: featureCounts, htseq-count, tximport for transcript-to-gene summarisationThis section explains gene-level quantification from aligned or pseudo-aligned reads, comparing featureCounts and htseq-count, and detailing how tximport aggregates transcript-level estimates into robust gene-level matrices for downstream statistical analysis.
Counting reads with featureCounts optionsUsing htseq-count modes and annotationsHandling strandedness and multimapping readsImporting Salmon and kallisto with tximportBuilding gene-level count matricesAssessing quantification quality and coverageLesson 4Tools for data download and organisation: SRA Toolkit (prefetch/fastq-dump), ENA FTP/Aspera, wget/rsync, and recommended inputs/outputsThis section covers reliable strategies for downloading and organising RNA-seq data, focusing on SRA Toolkit, ENA access, command-line transfer tools, and defining consistent input and output structures that support automation and reproducibility.
Using SRA Toolkit prefetch and fasterq-dumpAccessing ENA via FTP and AsperaDownloading with wget and rsync safelyChoosing raw and processed file formatsDocumenting download metadata and checksumsAutomating downloads with scripts and logsLesson 5Quality control tools and outputs: FastQC, MultiQC, key metrics to inspect (per-base quality, adapter content, duplication, GC)This section focuses on RNA-seq quality control, using FastQC and MultiQC to summarise key metrics such as per-base quality, adapter contamination, duplication, and GC content, and to decide whether trimming or resequencing is required locally.
Running FastQC on raw and trimmed readsInterpreting per-base quality profilesDetecting adapters and overrepresented sequencesEvaluating duplication and GC contentAggregating reports with MultiQCDefining QC thresholds and actionsLesson 6Read trimming and filtering: when to trim, tools (Trim Galore/Cutadapt/fastp), main parameters and outputsThis section explains when and how to trim RNA-seq reads, covering adapter and quality trimming, length filtering, and key parameters in tools such as Trim Galore, Cutadapt, and fastp, while avoiding over-trimming that harms downstream analyses.
Deciding whether trimming is necessaryAdapter detection and removal strategiesQuality-based trimming thresholdsMinimum length and complexity filtersUsing Trim Galore and Cutadapt optionsFastp for integrated QC and trimmingLesson 7Basic downstream analyses: GO/KEGG enrichment (clusterProfiler), GSEA preranked, pathway visualisation, and gene set selectionThis section introduces downstream functional analyses after differential expression, including GO and KEGG enrichment with clusterProfiler, preranked GSEA, pathway visualisation, and principled strategies for selecting and filtering gene sets.
Preparing ranked gene lists for GSEAGO and KEGG enrichment with clusterProfilerChoosing appropriate gene set databasesVisualizing enriched pathways and networksFiltering and prioritizing gene setsReporting functional results reproduciblyLesson 8High-level pipeline layout: data download, QC, trimming, alignment/pseudo-alignment, quantification, differential expression, downstream analysisThis section presents the overall RNA-seq pipeline structure, from data acquisition and QC through trimming, alignment or pseudo-alignment, quantification, normalisation, differential expression, and downstream functional analysis, emphasising modular, scripted workflows.
Defining pipeline stages and dependenciesPlanning inputs, outputs, and file flowIntegrating QC, trimming, and alignmentLinking quantification to DE analysisConnecting DE to enrichment workflowsDocumenting the pipeline with diagramsLesson 9Normalisation and exploratory data analysis: TPM/FPKM limits, DESeq2 normalisation, PCA, sample-sample distance heatmapsThis section covers normalisation and exploratory analysis of RNA-seq data, discussing limitations of TPM and FPKM, DESeq2-based normalisation, variance stabilisation, principal component analysis, and sample distance heatmaps for detecting batch effects.
Limitations of TPM and FPKM measuresDESeq2 size factors and normalizationVariance-stabilizing and rlog transformsPrincipal component analysis of samplesSample-sample distance heatmapsDetecting batch effects and outliersLesson 10Basic visualisation best practices: MA plots, volcano plots, heatmaps, pathway dotplots, and interactive report options (R Markdown, Jupyter)This section introduces effective visualisation strategies for RNA-seq results, emphasising clear communication of differential expression, sample structure, and pathway changes using static plots and interactive, reproducible reports built in R Markdown or Jupyter.
Constructing and interpreting MA plotsDesigning clear volcano plots for DE genesBuilding publication-quality heatmapsPathway dotplots for enrichment resultsInteractive R Markdown RNA-seq reportsJupyter-based exploratory visualizationLesson 11Alignment vs pseudo-alignment: STAR, HISAT2, Salmon, kallisto — tradeoffs and outputs (BAM, transcript/genecounts)This section compares alignment-based tools such as STAR and HISAT2 with pseudo-alignment tools like Salmon and kallisto, highlighting tradeoffs in speed, accuracy, resource use, and outputs including BAM files and transcript or gene-level counts.
When to choose STAR or HISAT2 alignersConfiguring genome indexes and annotationsUsing Salmon in quasi-mapping modeRunning kallisto for rapid quantificationComparing BAM and quant.sf style outputsBenchmarking speed, memory, and accuracy