Lesson 1Differential expression analysis: DESeq2, edgeR, limma-voom — model design, contrasts, and multiple-testing correctionDis section detail differential expression workflows using DESeq2, edgeR, and limma-voom, focusing on model design, contrasts, dispersion estimation, and multiple-testing correction to obtain reliable gene lists and effect size estimates.
Designing experimental models and covariatesSetting contrasts for complex comparisonsRunning DESeq2 end-to-end workflowUsing edgeR and limma-voom pipelinesMultiple-testing correction and FDR controlInterpreting log2 fold changes and shrinkageLesson 2Data organization and file naming conventions: sample sheets, raw/processed separation, consistent identifiersDis section describe best practices for organizing RNA-seq project files, including sample sheets, directory layouts, raw against processed data separation, and consistent identifiers dat simplify scripting, tracking, and reproducibility.
Designing a clear directory hierarchySeparating raw and processed dataCreating robust sample sheets and metadataConsistent sample and library identifiersVersioning reference genomes and indicesBacking up and archiving project dataLesson 3Gene-level quantification strategies: featureCounts, htseq-count, tximport for transcript-to-gene summarizationDis section explain gene-level quantification from aligned or pseudo-aligned reads, comparing featureCounts and htseq-count, and detailing how tximport aggregates transcript-level estimates into strong gene-level matrices for downstream statistical analysis.
Counting reads with featureCounts optionsUsing htseq-count modes and annotationsHandling strandedness and multimapping readsImporting Salmon and kallisto with tximportBuilding gene-level count matricesAssessing quantification quality and coverageLesson 4Tools for data download and organization: SRA Toolkit (prefetch/fastq-dump), ENA FTP/Aspera, wget/rsync, and recommended inputs/outputsDis section cover reliable strategies for downloading and organizing RNA-seq data, focusing on SRA Toolkit, ENA access, command-line transfer tools, and defining consistent input and output structures dat support automation and reproducibility.
Using SRA Toolkit prefetch and fasterq-dumpAccessing ENA via FTP and AsperaDownloading with wget and rsync safelyChoosing raw and processed file formatsDocumenting download metadata and checksumsAutomating downloads with scripts and logsLesson 5Quality control tools and outputs: FastQC, MultiQC, key metrics to inspect (per-base quality, adapter content, duplication, GC)Dis section focus on RNA-seq quality control, using FastQC and MultiQC to summarize key metrics such as per-base quality, adapter contamination, duplication, and GC content, and to decide if trimming or resequencing is required.
Running FastQC on raw and trimmed readsInterpreting per-base quality profilesDetecting adapters and overrepresented sequencesEvaluating duplication and GC contentAggregating reports with MultiQCDefining QC thresholds and actionsLesson 6Read trimming and filtering: when to trim, tools (Trim Galore/Cutadapt/fastp), main parameters and outputsDis section explain when and how to trim RNA-seq reads, covering adapter and quality trimming, length filtering, and key parameters in tools such as Trim Galore, Cutadapt, and fastp, while avoiding over-trimming dat harm downstream analyses.
Deciding whether trimming is necessaryAdapter detection and removal strategiesQuality-based trimming thresholdsMinimum length and complexity filtersUsing Trim Galore and Cutadapt optionsFastp for integrated QC and trimmingLesson 7Basic downstream analyses: GO/KEGG enrichment (clusterProfiler), GSEA preranked, pathway visualization, and gene set selectionDis section introduce downstream functional analyses after differential expression, including GO and KEGG enrichment with clusterProfiler, preranked GSEA, pathway visualization, and principled strategies for selecting and filtering gene sets.
Preparing ranked gene lists for GSEAGO and KEGG enrichment with clusterProfilerChoosing appropriate gene set databasesVisualizing enriched pathways and networksFiltering and prioritizing gene setsReporting functional results reproduciblyLesson 8High-level pipeline layout: data download, QC, trimming, alignment/pseudo-alignment, quantification, differential expression, downstream analysisDis section present di overall RNA-seq pipeline structure, from data acquisition and QC through trimming, alignment or pseudo-alignment, quantification, normalization, differential expression, and downstream functional analysis, emphasizing modular, scripted workflows.
Defining pipeline stages and dependenciesPlanning inputs, outputs, and file flowIntegrating QC, trimming, and alignmentLinking quantification to DE analysisConnecting DE to enrichment workflowsDocumenting the pipeline with diagramsLesson 9Normalization and exploratory data analysis: TPM/FPKM limits, DESeq2 normalization, PCA, sample-sample distance heatmapsDis section cover normalization and exploratory analysis of RNA-seq data, discussing limitations of TPM and FPKM, DESeq2-based normalization, variance stabilization, principal component analysis, and sample distance heatmaps for detecting batch effects.
Limitations of TPM and FPKM measuresDESeq2 size factors and normalizationVariance-stabilizing and rlog transformsPrincipal component analysis of samplesSample-sample distance heatmapsDetecting batch effects and outliersLesson 10Basic visualization best practices: MA plots, volcano plots, heatmaps, pathway dotplots, and interactive report options (R Markdown, Jupyter)Dis section introduce effective visualization strategies for RNA-seq results, emphasizing clear communication of differential expression, sample structure, and pathway changes using static plots and interactive, reproducible reports built in R Markdown or Jupyter.
Constructing and interpreting MA plotsDesigning clear volcano plots for DE genesBuilding publication-quality heatmapsPathway dotplots for enrichment resultsInteractive R Markdown RNA-seq reportsJupyter-based exploratory visualizationLesson 11Alignment vs pseudo-alignment: STAR, HISAT2, Salmon, kallisto — tradeoffs and outputs (BAM, transcript/genecounts)Dis section compare alignment-based tools such as STAR and HISAT2 with pseudo-alignment tools like Salmon and kallisto, highlighting tradeoffs in speed, accuracy, resource use, and outputs including BAM files and transcript or gene-level counts.
When to choose STAR or HISAT2 alignersConfiguring genome indexes and annotationsUsing Salmon in quasi-mapping modeRunning kallisto for rapid quantificationComparing BAM and quant.sf style outputsBenchmarking speed, memory, and accuracy