Lesson 1Differential expression analysis: DESeq2, edgeR, limma-voom — model design, contrasts, and multiple-testing correctionThis section details differential expression workflows using DESeq2, edgeR, and limma-voom, focusing on model design, contrasts, dispersion estimation, and multiple-testing correction to get reliable gene lists and effect size estimates.
Designing experimental models and covariatesSetting contrasts for complex comparisonsRunning DESeq2 end-to-end workflowUsing edgeR and limma-voom pipelinesMultiple-testing correction and FDR controlInterpreting log2 fold changes and shrinkageLesson 2Data organisation and file naming conventions: sample sheets, raw/processed separation, consistent identifiersThis section outlines best practices for organising RNA-seq project files, including sample sheets, directory layouts, raw versus processed data separation, and consistent identifiers that make scripting, tracking, and reproducibility easier.
Designing a clear directory hierarchySeparating raw and processed dataCreating robust sample sheets and metadataConsistent sample and library identifiersVersioning reference genomes and indicesBacking up and archiving project dataLesson 3Gene-level quantification strategies: featureCounts, htseq-count, tximport for transcript-to-gene summarisationThis section explains gene-level quantification from aligned or pseudo-aligned reads, comparing featureCounts and htseq-count, and showing how tximport aggregates transcript-level estimates into robust gene-level matrices for downstream statistical analysis.
Counting reads with featureCounts optionsUsing htseq-count modes and annotationsHandling strandedness and multimapping readsImporting Salmon and kallisto with tximportBuilding gene-level count matricesAssessing quantification quality and coverageLesson 4Tools for data download and organisation: SRA Toolkit (prefetch/fastq-dump), ENA FTP/Aspera, wget/rsync, and recommended inputs/outputsThis section covers reliable ways to download and organise RNA-seq data, focusing on SRA Toolkit, ENA access, command-line transfer tools, and defining consistent input and output structures for automation and reproducibility.
Using SRA Toolkit prefetch and fasterq-dumpAccessing ENA via FTP and AsperaDownloading with wget and rsync safelyChoosing raw and processed file formatsDocumenting download metadata and checksumsAutomating downloads with scripts and logsLesson 5Quality control tools and outputs: FastQC, MultiQC, key metrics to inspect (per-base quality, adapter content, duplication, GC)This section focuses on RNA-seq quality control, using FastQC and MultiQC to summarise key metrics like per-base quality, adapter contamination, duplication, and GC content, and decide if trimming or resequencing is needed.
Running FastQC on raw and trimmed readsInterpreting per-base quality profilesDetecting adapters and overrepresented sequencesEvaluating duplication and GC contentAggregating reports with MultiQCDefining QC thresholds and actionsLesson 6Read trimming and filtering: when to trim, tools (Trim Galore/Cutadapt/fastp), main parameters and outputsThis section explains when and how to trim RNA-seq reads, covering adapter and quality trimming, length filtering, and key parameters in tools like Trim Galore, Cutadapt, and fastp, avoiding over-trimming that harms downstream analyses.
Deciding whether trimming is necessaryAdapter detection and removal strategiesQuality-based trimming thresholdsMinimum length and complexity filtersUsing Trim Galore and Cutadapt optionsFastp for integrated QC and trimmingLesson 7Basic downstream analyses: GO/KEGG enrichment (clusterProfiler), GSEA preranked, pathway visualisation, and gene set selectionThis section introduces downstream functional analyses after differential expression, including GO and KEGG enrichment with clusterProfiler, preranked GSEA, pathway visualisation, and smart strategies for selecting and filtering gene sets.
Preparing ranked gene lists for GSEAGO and KEGG enrichment with clusterProfilerChoosing appropriate gene set databasesVisualizing enriched pathways and networksFiltering and prioritizing gene setsReporting functional results reproduciblyLesson 8High-level pipeline layout: data download, QC, trimming, alignment/pseudo-alignment, quantification, differential expression, downstream analysisThis section presents the overall RNA-seq pipeline structure, from data acquisition and QC through trimming, alignment or pseudo-alignment, quantification, normalisation, differential expression, and downstream functional analysis, stressing modular, scripted workflows.
Defining pipeline stages and dependenciesPlanning inputs, outputs, and file flowIntegrating QC, trimming, and alignmentLinking quantification to DE analysisConnecting DE to enrichment workflowsDocumenting the pipeline with diagramsLesson 9Normalisation and exploratory data analysis: TPM/FPKM limits, DESeq2 normalisation, PCA, sample-sample distance heatmapsThis section covers normalisation and exploratory analysis of RNA-seq data, discussing limits of TPM and FPKM, DESeq2-based normalisation, variance stabilisation, principal component analysis, and sample distance heatmaps for spotting batch effects.
Limitations of TPM and FPKM measuresDESeq2 size factors and normalizationVariance-stabilizing and rlog transformsPrincipal component analysis of samplesSample-sample distance heatmapsDetecting batch effects and outliersLesson 10Basic visualisation best practices: MA plots, volcano plots, heatmaps, pathway dotplots, and interactive report options (R Markdown, Jupyter)This section introduces effective visualisation strategies for RNA-seq results, stressing clear communication of differential expression, sample structure, and pathway changes using static plots and interactive, reproducible reports in R Markdown or Jupyter.
Constructing and interpreting MA plotsDesigning clear volcano plots for DE genesBuilding publication-quality heatmapsPathway dotplots for enrichment resultsInteractive R Markdown RNA-seq reportsJupyter-based exploratory visualizationLesson 11Alignment vs pseudo-alignment: STAR, HISAT2, Salmon, kallisto — tradeoffs and outputs (BAM, transcript/genecounts)This section compares alignment tools like STAR and HISAT2 with pseudo-alignment tools like Salmon and kallisto, highlighting trade-offs in speed, accuracy, resource use, and outputs including BAM files and transcript or gene-level counts.
When to choose STAR or HISAT2 alignersConfiguring genome indexes and annotationsUsing Salmon in quasi-mapping modeRunning kallisto for rapid quantificationComparing BAM and quant.sf style outputsBenchmarking speed, memory, and accuracy