Table of Contents
Low-level RNA detection
Patch-seq RRR | Patch-seq Reduced-Rank Regression
Provides an interpretable visualization of the relationship between high-dimensional single cell transcriptomes and electrophysiological information obtained using techniques like patch-seq. Patch-seq RRR is a framework based on sparse reduced-rank regression that permits to obtain a low-dimensional representation of key features of the data, relating the dominant gene expression patterns predicting variation in the electrophysiological space
Publications:
- (Kobak et al., ) Sparse reduced-rank regression for exploratory visualization of single cell patch-seq recordings. BioRxiv.
Institutions(s):
Institute for Ophthalmic Research, Center for Integrative Neuroscience and Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany; Graduate Training Center for Neuroscience, University of Tübingen, Tübingen, Germany
scRNA-seq analysis
The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression. Alongside the technological breakthroughs that have facilitated the large-scale generation of single-cell transcriptomic data, it is important to consider the specific computational and analytical challenges that still have to be overcome. Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.
Noise reduction
scLVM | single-cell Latent Variable Model
A modelling framework for single-cell RNA-seq data that can be used to dissect the observed heterogeneity into different sources, thereby allowing for the correction of confounding sources of variation. scLVM was primarily designed to account for cell-cycle induced variations in single-cell RNA-seq data where cell cycle is the primary source of variability.
Publications:
- (Buettner et al., 2015) Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol.
Institutions(s):
Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
OEFinder
A statistical method and software to identify a sorted list of ordering effect (OE) genes. OEFinder is available as an R package along with user-friendly graphical interface implementations that allows users to check for potential artifacts in scRNA-seq data generated by the Fluidigm C1 platform.
Publications:
- (Leng et al., 2016) OEFinder: a user interface to identify and visualize ordering effects in single-cell RNA-seq data. Bioinformatics.
Institutions(s):
Morgridge Institute for Research, Madison, WI, USA; Department of Statistics, University of Wisconsin, Madison, WI, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
URSM | Unified RNA-Sequencing Model
Analyzes two types of RNA-seq: single cell data and bulk data. URSM adjusts dropout events in single cell data and achieves simultaneously deconvolution in bulk data. This software doesn’t need to calculate on the same subjects the single cell and bulk data. It can (1) obtain reliable estimation of cell type specific gene expression profiles; (2) infer the dropout entries in single cell data; and (3) infer the mixing proportions of different cell types in bulk samples.
Publications:
- (Zhu et al., 2017) A Unified Statistical Framework for Single Cell and Bulk RNA Sequencing Data. ArXiv.
Institutions(s):
Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA; Department of Psychiatry and Human Genetics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
Transcript quantification
TraCeR
A computational method to reconstruct full-length, paired T cell receptor (TCR) sequences from T lymphocyte single-cell RNA sequence data. TraCeR links T cell specificity with functional response by revealing clonal relationships between cells alongside their transcriptional profiles. TraCeR extracts TCR-derived sequencing reads for each cell by alignment against ‘combinatorial recombinomes’ comprising all possible combinations of V and J segments. Reads are then assembled into contiguous sequences that are analyzed to find full-length, recombined TCR sequences. Importantly, the reconstructed recombinant sequences typically contain nearly the complete length of the TCR V(D)J region and so allow high-confidence discrimination between closely related gene segments. Our method is sensitive, accurate and easy to adapt to any species for which annotated TCR gene sequences are available.
Publications:
- (Stubbington et al., 2016) T cell fate and clonality inference from single-cell transcriptomes. Nat Methods.
Institutions(s):
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK; Wellcome Trust Sanger Institute, Cambridge, UK
BRIE | Bayesian Regression for Isoform Estimation
Quantifies splicing in individual single cells. BRIE is a flexible framework that detects differential splicing between individual cells from scRNA-seq data. This method was developed for modelling and, while sequence features are particularly appealing due to their ease of usage and availability, additional side information, such as DNA methylation and chromatin accessibility, could easily be incorporated.
Publications:
- (Huang and Sanguinetti, 2017) BRIE: transcriptome-wide splicing quantification in single cells. Genome Biol.
Institutions(s):
School of Informatics, University of Edinburgh, Edinburgh, UK; Centre for Synthetic and Systems Biology (SynthSys), University of Edinburgh, Edinburgh, UK
UMIS | Unique Molecular IdentifierS
Provides a way of removing amplification biases, the assumed absolute quantification does not appear to hold true perfectly. Umis is a flexible tool for counting the number of unique molecular identifiers. There are four steps in this method: (i) formatting reads, (ii) filtering noisy cellular barcodes, (iii) pseudo-mapping to cDNAs, and (iv) counting molecular identifiers. The quantitation used in umis handles reads that could come from multiple transcripts by assigning a fractional count to each transcript and then filtering for a minimum count at the end.
Publications:
- (Svensson et al., 2017) Power analysis of single-cell RNA-sequencing experiments. Nat Methods.
Institutions(s):
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK; Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
Cell lineage and pseudotime inference/ Marker gene detection
Seurat
Allows studying of spatial patterning of gene expression at the single-cell level. Seurat is an R package that enables quality control (QC), analysis, and exploration of single cell RNA-seq data. The software includes three computational methods: (1) unsupervised clustering and discovery of cell types and states, (2) spatial reconstruction of single cell data, and (3) integrated analysis of single cell RNA-seq across conditions, technologies, and species. It can also localize rare subpopulations, and map both spatially restricted and scattered groups.
Publications:
- (Satija et al., 2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol.
- (Butler et al., 2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species Nat Biotechnol
Institutions(s):
Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
SPADE | Spanning tree Progression of Density normalized Events
Facilitates the analysis of cellular heterogeneity, the identification of cell types, and comparison of functional markers in response to perturbations, based on a versatile method. SPADE helps to organize high-dimensional cytometry data in an unsupervised manner, and to investigate natural and pathogenic cellular heterogeneity for biological insight. The SPADE algorithm consists of four components: (i) density-dependent downsampling, (ii) clustering, (iii) linking clusters with a minimum spanning tree, and (iv) upsampling to restore all cells in the final result. This modularized process allows more efficient sub-algorithms to replace the current components. In this sense, SPADE can be viewed as a framework for cytometric data analysis and visualization that has the capacity to be evolved and adapted.
Publications:
- (Qiu et al., 2011) Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat Biotechnol.
- (Bendall et al., 2011) Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science.
Institutions(s):
Department of Radiology, Stanford University, Stanford, CA, USA
Monocle
Allows to analyze single-cell gene expression experiments. Monocle can realize differential expression analysis, clustering, visualization, and other useful tasks on single cell expression data. The software orders individual cells according to progress through a biological process, without knowing ahead of time which genes define progress through that process. It is designed to work with RNA-Seq and qPCR data, but could be used with other types as well. The tools Census and BEAM are implemented in Monocle.
Publications:
- (Trapnell et al., 2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol.
- (Qiu et al., 2017) Reversed graph embedding resolves complex single-cell developmental trajectories. BioRxiv.
Institutions(s):
Department of Genome Sciences, University of Washington, Seattle, WA, USA; Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA; Department of Applied Mathematics, University of Washington, Seattle, WA, USA
Single-cell qPCR
DPT
Measures progression through branching lineages using a random-walk-based distance in diffusion map space. DPT allows for branching and pseudotime analysis on large-scale RNA-seq data sets. This package is significantly more robust with respect to noise in low-density regions and cell outliers than existing methods, which rely on the estimation of minimum spanning trees or sampling-based distances. Furthermore, DPT is able to remove asynchronity of scRNA-seq snapshot data from several days, aligning cells in terms of their degree of differentiation.
Publications:
- (Haghverdi et al., 2016) Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods.
Institutions(s):
Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
MAST | Model-based Analysis of Single-cell Transcriptomics
A flexible statistical framework for the analysis of single-cell RNA sequencing data. MAST is suitable for supervised analyses about differential expression of genes and gene modules, as well as unsupervised analyses of model residuals, to generate hypotheses regarding co-expression of genes. MAST accounts for the bimodality of single-cell data by jointly modeling rates of expression (discrete) and positive mean expression (continuous) values. Information from the discrete and continuous parts is combined to infer changes in expression levels using gene or gene set-based statistics. Because our approach uses a generalized linear framework, it can be used to jointly estimate nuisance variation from biological and technical sources, as well as biological effects of interest.
Publications:
- (Finak et al., 2015) MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol.
- (McDavid et al., 2013) Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics.
Institutions(s):
Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA; Benaroya Research Institute at Virginia Mason, Seattle, WA, USA
SCATTOME | Single Cell Analysis of Targeted TranscriptOME
Predicts drug sensitivity of single cells within human tumors. SCATTome is an R software package that uses machine learning approaches to build binary classification models, selects top significant genes, predicts drug response of individual cells and computes each cell’s probability of response based on the targeted transcriptome signature. This package can be used in other cancer models to predict heterogeneity of drug response in individual cells based on targeted single-cell gene expression analysis and may also be used to identify minimal residual disease (MRD).
Publications:
- (Mitra et al., 2016) Single-cell analysis of targeted transcriptome predicts drug sensitivity of single cells within human myeloma tumors. Leukemia.
Institutions(s):
Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN, USA; School of Statistics, University of Minnesota, Minneapolis, MN, USA
dPCR
ddPCRclust
Predicts drug sensitivity of single cells within human tumors. SCATTome is an R software package that uses machine learning approaches to build binary classification models, selects top significant genes, predicts drug response of individual cells and computes each cell’s probability of response based on the targeted transcriptome signature. This package can be used in other cancer models to predict heterogeneity of drug response in individual cells based on targeted single-cell gene expression analysis and may also be used to identify minimal residual disease (MRD).
Publications:
- (Brink et al., ) ddPCRclust — An R package and Shiny app for automated analysis of multiplexed ddPCR data. Bioinformatics.
Institutions(s):
International Research Training Group “Computational Methods for the Analysis of the Diversity and Dynamics of Genomes” and Biodata Mining Group, Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld ,Germany
Definetherain
Allows user to cluster digital droplet polymerase chain reaction (PCR) data to define positive and negative responses. Definetherain is an open-access web-based JavaScript program based on k-nearest neighbour clustering. It defines indeterminate rain droplets using the positive population rather than negative population. By defining the cut-off for positive droplets from the positive as opposed to the negative cluster and removing ambiguous rain droplets, this application provides improved accuracy at low template copy numbers.
Publications:
- (Jones et al., 2014) Low copy target detection by Droplet Digital PCR through application of a novel open access bioinformatic pipeline, ‘definetherain’. J Virol Methods.
Institutions(s):
Peter Medawar Building for Pathogen Research, Nuffield Department of Clinical Medicine, Oxford, UK; IUH, Hôpital St. Louis, Paris, France; Institute for Emerging Infections, The Oxford Martin School, Oxford, UK
ddpcr | droplet digital polymerase chain reaction
Quantifies DNA which holds great promise in clinical diagnostics. ddpcr allows users to analyze ddPCR data using an interactive graphical interface. It includes an interactive web application with a visual user interface to facilitate analysis for anyone who is not comfortable with using R. The counting of droplets emitting a sequence-specific fluorescent signal permits the number of copies of that sequence present in the sample to be quantified with excellent sensitivity and precision.
Publications:
- (Attali et al., 2016) ddpcr: an R package and web application for analysis of droplet digital PCR data. F1000Res.