Lesson 1Differential expression analysis: DESeq2, edgeR, limma-voom — model design, contrasts, and multiple-testing correctionThis part details differential expression workflows using DESeq2, edgeR, and limma-voom, focusing on model setup, contrasts, dispersion estimates, and multiple-testing fixes to get reliable gene lists and effect sizes.
Designing experimental models and covariatesSetting contrasts for complex comparisonsRunning DESeq2 end-to-end workflowUsing edgeR and limma-voom pipelinesMultiple-testing correction and FDR controlInterpreting log2 fold changes and shrinkageLesson 2Data organization and file naming conventions: sample sheets, raw/processed separation, consistent identifiersThis part describes best ways to organise RNA-seq project files, including sample sheets, directory setups, raw versus processed data split, and steady IDs that make scripting, tracking, and repeatability easier.
Designing a clear directory hierarchySeparating raw and processed dataCreating robust sample sheets and metadataConsistent sample and library identifiersVersioning reference genomes and indicesBacking up and archiving project dataLesson 3Gene-level quantification strategies: featureCounts, htseq-count, tximport for transcript-to-gene summarizationThis part explains gene-level quantification from aligned or pseudo-aligned reads, comparing featureCounts and htseq-count, and showing how tximport pulls transcript-level estimates into strong gene-level matrices for stats analysis.
Counting reads with featureCounts optionsUsing htseq-count modes and annotationsHandling strandedness and multimapping readsImporting Salmon and kallisto with tximportBuilding gene-level count matricesAssessing quantification quality and coverageLesson 4Tools for data download and organization: SRA Toolkit (prefetch/fastq-dump), ENA FTP/Aspera, wget/rsync, and recommended inputs/outputsThis part covers reliable ways to download and organise RNA-seq data, focusing on SRA Toolkit, ENA access, command-line transfer tools, and setting steady input and output structures for automation and repeatability.
Using SRA Toolkit prefetch and fasterq-dumpAccessing ENA via FTP and AsperaDownloading with wget and rsync safelyChoosing raw and processed file formatsDocumenting download metadata and checksumsAutomating downloads with scripts and logsLesson 5Quality control tools and outputs: FastQC, MultiQC, key metrics to inspect (per-base quality, adapter content, duplication, GC)This part focuses on RNA-seq quality control, using FastQC and MultiQC to sum up key metrics like per-base quality, adapter mess, duplication, and GC content, and decide if trimming or resequencing is needed.
Running FastQC on raw and trimmed readsInterpreting per-base quality profilesDetecting adapters and overrepresented sequencesEvaluating duplication and GC contentAggregating reports with MultiQCDefining QC thresholds and actionsLesson 6Read trimming and filtering: when to trim, tools (Trim Galore/Cutadapt/fastp), main parameters and outputsThis part explains when and how to trim RNA-seq reads, covering adapter and quality trimming, length filtering, and key settings in tools like Trim Galore, Cutadapt, and fastp, while avoiding too much trimming that hurts later analyses.
Deciding whether trimming is necessaryAdapter detection and removal strategiesQuality-based trimming thresholdsMinimum length and complexity filtersUsing Trim Galore and Cutadapt optionsFastp for integrated QC and trimmingLesson 7Basic downstream analyses: GO/KEGG enrichment (clusterProfiler), GSEA preranked, pathway visualization, and gene set selectionThis part introduces downstream functional analyses after differential expression, including GO and KEGG enrichment with clusterProfiler, preranked GSEA, pathway visuals, and good ways to pick and filter gene sets.
Preparing ranked gene lists for GSEAGO and KEGG enrichment with clusterProfilerChoosing appropriate gene set databasesVisualizing enriched pathways and networksFiltering and prioritizing gene setsReporting functional results reproduciblyLesson 8High-level pipeline layout: data download, QC, trimming, alignment/pseudo-alignment, quantification, differential expression, downstream analysisThis part shows the overall RNA-seq pipeline structure, from data getting and QC through trimming, alignment or pseudo-alignment, quantification, normalisation, differential expression, and downstream functional analysis, stressing modular, scripted workflows.
Defining pipeline stages and dependenciesPlanning inputs, outputs, and file flowIntegrating QC, trimming, and alignmentLinking quantification to DE analysisConnecting DE to enrichment workflowsDocumenting the pipeline with diagramsLesson 9Normalization and exploratory data analysis: TPM/FPKM limits, DESeq2 normalization, PCA, sample-sample distance heatmapsThis part covers normalisation and exploratory analysis of RNA-seq data, discussing limits of TPM and FPKM, DESeq2-based normalisation, variance stabilisation, principal component analysis, and sample distance heatmaps for spotting batch effects.
Limitations of TPM and FPKM measuresDESeq2 size factors and normalizationVariance-stabilizing and rlog transformsPrincipal component analysis of samplesSample-sample distance heatmapsDetecting batch effects and outliersLesson 10Basic visualization best practices: MA plots, volcano plots, heatmaps, pathway dotplots, and interactive report options (R Markdown, Jupyter)This part introduces good visualization strategies for RNA-seq results, stressing clear sharing of differential expression, sample structure, and pathway changes using static plots and interactive, repeatable reports in R Markdown or Jupyter.
Constructing and interpreting MA plotsDesigning clear volcano plots for DE genesBuilding publication-quality heatmapsPathway dotplots for enrichment resultsInteractive R Markdown RNA-seq reportsJupyter-based exploratory visualizationLesson 11Alignment vs pseudo-alignment: STAR, HISAT2, Salmon, kallisto — tradeoffs and outputs (BAM, transcript/genecounts)This part compares alignment tools like STAR and HISAT2 with pseudo-alignment tools like Salmon and kallisto, pointing out tradeoffs in speed, accuracy, resource use, and outputs including BAM files and transcript or gene counts.
When to choose STAR or HISAT2 alignersConfiguring genome indexes and annotationsUsing Salmon in quasi-mapping modeRunning kallisto for rapid quantificationComparing BAM and quant.sf style outputsBenchmarking speed, memory, and accuracy