Lesson 1Differential expression analysis: DESeq2, edgeR, limma-voom — model design, contrasts, and multiple-testing correctionDetails workflows for DESeq2, edgeR, limma-voom on differential expression, covering model setup, contrasts, dispersion, and p-value corrections for reliable gene lists.
Designing experimental models and covariatesSetting contrasts for complex comparisonsRunning DESeq2 end-to-end workflowUsing edgeR and limma-voom pipelinesMultiple-testing correction and FDR controlInterpreting log2 fold changes and shrinkageLesson 2Data organization and file naming conventions: sample sheets, raw/processed separation, consistent identifiersBest ways to organise RNA-seq files with sample sheets, folders, raw vs processed split, and steady IDs for easy scripts and repeats.
Designing a clear directory hierarchySeparating raw and processed dataCreating robust sample sheets and metadataConsistent sample and library identifiersVersioning reference genomes and indicesBacking up and archiving project dataLesson 3Gene-level quantification strategies: featureCounts, htseq-count, tximport for transcript-to-gene summarizationGene counting from aligned reads using featureCounts, htseq-count, and tximport to sum transcripts to genes for stats work.
Counting reads with featureCounts optionsUsing htseq-count modes and annotationsHandling strandedness and multimapping readsImporting Salmon and kallisto with tximportBuilding gene-level count matricesAssessing quantification quality and coverageLesson 4Tools for data download and organization: SRA Toolkit (prefetch/fastq-dump), ENA FTP/Aspera, wget/rsync, and recommended inputs/outputsReliable download tools like SRA Toolkit, ENA, wget/rsync, with steady input/output setups for automation.
Using SRA Toolkit prefetch and fasterq-dumpAccessing ENA via FTP and AsperaDownloading with wget and rsync safelyChoosing raw and processed file formatsDocumenting download metadata and checksumsAutomating downloads with scripts and logsLesson 5Quality control tools and outputs: FastQC, MultiQC, key metrics to inspect (per-base quality, adapter content, duplication, GC)Use FastQC, MultiQC for QC on quality, adapters, duplicates, GC to decide on trimming or re-sequencing.
Running FastQC on raw and trimmed readsInterpreting per-base quality profilesDetecting adapters and overrepresented sequencesEvaluating duplication and GC contentAggregating reports with MultiQCDefining QC thresholds and actionsLesson 6Read trimming and filtering: when to trim, tools (Trim Galore/Cutadapt/fastp), main parameters and outputsWhen and how to trim reads with Trim Galore, Cutadapt, fastp, avoiding too much trimming that messes analysis.
Deciding whether trimming is necessaryAdapter detection and removal strategiesQuality-based trimming thresholdsMinimum length and complexity filtersUsing Trim Galore and Cutadapt optionsFastp for integrated QC and trimmingLesson 7Basic downstream analyses: GO/KEGG enrichment (clusterProfiler), GSEA preranked, pathway visualization, and gene set selectionPost-DE functional work with clusterProfiler for GO/KEGG, GSEA, pathway views, and smart gene set picks.
Preparing ranked gene lists for GSEAGO and KEGG enrichment with clusterProfilerChoosing appropriate gene set databasesVisualizing enriched pathways and networksFiltering and prioritizing gene setsReporting functional results reproduciblyLesson 8High-level pipeline layout: data download, QC, trimming, alignment/pseudo-alignment, quantification, differential expression, downstream analysisOverall pipeline from download to QC, trim, align, quantify, normalise, DE, and functional analysis with modular scripts.
Defining pipeline stages and dependenciesPlanning inputs, outputs, and file flowIntegrating QC, trimming, and alignmentLinking quantification to DE analysisConnecting DE to enrichment workflowsDocumenting the pipeline with diagramsLesson 9Normalization and exploratory data analysis: TPM/FPKM limits, DESeq2 normalization, PCA, sample-sample distance heatmapsNormalisation and EDA limits of TPM/FPKM, DESeq2 norms, PCA, distance maps for batch checks.
Limitations of TPM and FPKM measuresDESeq2 size factors and normalizationVariance-stabilizing and rlog transformsPrincipal component analysis of samplesSample-sample distance heatmapsDetecting batch effects and outliersLesson 10Basic visualization best practices: MA plots, volcano plots, heatmaps, pathway dotplots, and interactive report options (R Markdown, Jupyter)Viz tips for DE, samples, pathways with plots and interactive R Markdown/Jupyter reports.
Constructing and interpreting MA plotsDesigning clear volcano plots for DE genesBuilding publication-quality heatmapsPathway dotplots for enrichment resultsInteractive R Markdown RNA-seq reportsJupyter-based exploratory visualizationLesson 11Alignment vs pseudo-alignment: STAR, HISAT2, Salmon, kallisto — tradeoffs and outputs (BAM, transcript/genecounts)Compare STAR/HISAT2 alignment vs Salmon/kallisto pseudo, on speed, accuracy, outputs.
When to choose STAR or HISAT2 alignersConfiguring genome indexes and annotationsUsing Salmon in quasi-mapping modeRunning kallisto for rapid quantificationComparing BAM and quant.sf style outputsBenchmarking speed, memory, and accuracy