tfcb_2021

Lectures 18: Introduction to single-cell RNA-seq

In this lecture, we will take analyze a single-cell RNA-seq data using scanpy. The lecture will introduce Anndata objects, plotting and interacting witn single-cell RNA-seq, QC and analysis of data and as time permits, batch correction.

We will use two PBMC datasets made available by 10X Genomics. Please download the following to the data/ directory:

Pre-analyzed data from here.
Count matrix from here. See here for description of the data.
A second count matrix of PBMCs to be used for batch-correction. Download from here and a description is available here.

Learning Objectives

Insights about why and how of single-cell RNA-seq.
Learn how to process and analyze single-cell RNA-seq datasets.
Single-cell RNA-seq data is highly interactive. Learn different ways to visualize and interact with the data.
Perform batch correction of scRNA-seq data.
Understand the reasoning behind various QC, preprocessing and analysis approaches for scRNA-seq.

Class materials

The lecture slides are available here
The Jupyter notebook which will be used for the lecture are available here slides are available Lecture18-scRNA-seq-analysis.ipynb. If you have difficulty performing a git pull to obtain the materials for this class, it is likely because you have a conflict between Lecture19-scRNA-seq-analysis.ipynb) and the version in the public GitHub repo. You can resolve this by making a copy of that markdown (naming it something different, like my_Lecture19-scRNA-seq-analysis.ipynb)) and then discarding changes to the original markdown file.

Data Download

Download the following datasets and copy it a folder called data/

Pre-analyzed data from here.
Count matrix from here. See here for description of the data.
A second count matrix of PBMCs to be used for batch-correction. Download from here and a description is available here.

Environment setup

We will be using cellxgene, and scanpy for Lectures 18 and 19, and Homework 8. All the packages and dependencies can be installed using conda.

Using conda, the following commands can be used to install all the required dependencies. The environment is the same one you used for the bulk RNA-seq analysis.

# Activate conda environment
conda activate tfcb2021_rna


# Scanpy installation 
conda install seaborn scikit-learn statsmodels numba pytables
conda install -c conda-forge python-igraph leidenalg
pip install scanpy


# cellxgene installation 
pip install cellxgene

# harmonypy installation
pip install harmonypy

# umap version
pip install umap-learn==0.5.1

# jupyter/ipython installation 
conda install -c conda-forge jupyterlab