Table of Contents
DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However, there are some known minor groove DNA-binding ligands such as netropsin, distamycin, Hoechst 33258, pentamidine, DAPI and others.
HT-SELEX Analysis
AptaTools
Provides a complete and comprehensive analysis pipeline for HT-SELEX data and comprises four fully automated core steps: data preprocessing, sequence analysis, cluster extraction, and data visualization. At the same time, AptaTools is modular enough to be extended with additional features. This pipeline is capable of handling most of the file formats generated by modern high throughput sequencing devices including paired-end data but also supports already pre-partitioned pools as input.
Publications
(2015 Nucleic Acids Res) Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery.
(2014 Res Comput Mol Bio)AptaCluster - A Method to Cluster HT-SELEX Aptamer Pools and Lessons from its Application.
Institution(s)
National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
SelexGLM
Determines the binding free energy coefficients. SelexGLM employs a generalized model based on the Poisson distribution. It allows to uncover and characterize intrinsic differences in DNA binding specificity between androgen receptor (AR) and glucocorticoid receptor (GR). This tool is able to take into account offsets within the probe that partially cover the fixed sequences upstream and downstream of the variable region.
Publications
- (2017 Genome Res) SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site..
Institution(s)
Department of Biochemistry, Carver College of Medicine, University of Iowa, IA, USA
DNase-seq Analysis
Sequencing of DNase I hypersensitive sites (DNase-seq) is a powerful technique for identifying cis-regulatory elements across the genome. Many people currently analyzing DNase-seq data are using tools designed for ChIP-seq work, but may be inappropriate for DNase-seq data.
HOMER
Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.
Publications
- (Heinz et al., 2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell.
Institution(s)
Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA; Department of Medicine, University of California, San Diego, La Jolla, CA, USA
DFilter
Detects functional signals in tag profiles from different assays such as histone ChIP-seq, TF ChiP-seq, DNase-seq and FAIRE-seq. DFilter is based on a single receiver operating characteristic – area under the curve (ROC-AUC) optimizing algorithm. This software suits for genomic signals of individual cell types diluted in cellular mixture because the proportion of marginal signals can be mistaken for noise by suboptimal algorithms.
Publications
- (2013 Nat Biotechnol) Uniform, optimal signal processing of mapped deep-sequencing data.
Institution(s)
Computational and Systems Biology, Genome Institute of Singapore, Singapore
ChIP-seq Analysis
Chromatin immunoprecipitation followed by sequencing (ChIP-seq), first described in 2007 (Barski et al., 2007; Johnson et al., 2007; Mikkelsen et al., 2007; Robertson et al., 2007), allows in vivo determination of where a protein binds the genome, which can be transcription factors, DNA-binding enzymes, histones, chaperones, or nucleosomes (Park, 2009; Furey, 2012).
De novo assembly
denovochipseq
A computational pipeline to extract the transcription factor binding motifs from ChIP-seq data, assuming no reference genome is available. denovochipseq combines de novo assembly with statistical tests enabling motif discovery without the use of a reference genome. We validate the performance of denovochipseq using human and mouse data. Analysis of fly data indicates that denovochipseq outperforms alignment based methods that utilize closely related species.
Publications
- (He et al., 2015) De novo ChIP-seq analysis. Genome Biol.
Institution(s)
Department of Human Genetics, The University of Chicago, CLSC, Chicago, IL, USA; Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
Peak calling
HOMER
Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.
Publications
- (Heinz et al., 2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell.
Institution(s)
Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA; Department of Medicine, University of California, San Diego, La Jolla, CA, USA
MACS
Analyzes data generated by short read sequencers. MACS is a standalone software dedicated to the forecasting of protein-DNA interaction sites from ChIP-Seq. The application is able to: (i) model the distance d and shifting tags by d/2 to enable the spatial resolution of the predicted sites; (ii) capture local biases in the genome by exploiting a dynamic λ local parameter and (iii) evaluate the false discovery rate (FDR) for each detected peak.
Publications
- (Zhang et al., 2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol.
Institution(s)
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA, USA.
SICER
Recognizes ChIP-enriched regions in histone modification data. SCIER is based on a mathematical theory for the score distribution in a genomic background model of random reads. It finds spatial clusters, large and small, unlikely to appear by chance. This tool is able to deal with the enrichment context of a local window in determining its significance. It assists users to reduce the sampling fluctuations in the control library.
Publications
- (Zang et al., 2009) A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics.
Institution(s)
Department of Physics, The George Washington University, Washington, DC, USA; Laboratory of Molecular Immunology, National Heart Lung and Blood Institute, NIH, Bethesda, MD, USA
Differential peak calling
DiffBind
Allows processing ChIP-seq data enriched for genomic loci where specific protein/DNA binding occurs. DiffBind is applicable to peak sets identified by ChIP-seq peak callers and aligned sequence read datasets. It is able to find sites that are differentially bound between two sample groups. This tool is useful to manage the results of multiple peak callers. It can serve to merge peak sets and count sequencing reads overlapping intervals in peak sets.
Publications
- (Ross-Innes et al., 2012) Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature.
Institution(s)
Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, UK; UCL Cancer Institute, University College London, London, UK.
GenoGAM
Offers methods dedicated to ChIP-Seq factorial design experiments modelization. GenoGAM is an R package employing generalized additive models to perform statistical analysis of genome-wide data. It provides base-level and region-level significance testing with controlled type I error rate. Additionally, the software is able to fit generalized additive models (GAMs) to very long longitudinal data such as whole chromosomes at base-pair resolution.
Publications
- (Stricker, 2016) Genome-wide generalized additive models. BioRxiv.
- (Stricker et al., 2017) GenoGAM: genome-wide generalized additive models for ChIP-Seq analysis. Bioinformatics.
- (Stricker et al., 2018) GenoGAM 2.0: Scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes. BioRxiv.
Institution(s)
Department of Informatics, Technical University Munich, Munich, Germany
MAnorm
A simple and effective method for quantitative comparison of ChIP-Seq data sets describing transcription factor binding sites and epigenetic modifications. The quantitative binding differences inferred by MAnorm showed strong correlation with both the changes in expression of target genes and the binding of cell type-specific regulators.
Publications
- (Shao et al., 2012) MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets. Genome Biol.
Institution(s)
Departments of Pediatric Oncology and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA, Division of Pediatric Hematology-Oncology, The Karp Family Research Laboratories, Children’s Hospital, Boston, MA, USA and Division of Cell and Molecular Biology, Department of Biology, Boston University, Boston, MA, USA
De novo motif discovery
HOMER
Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.
Publications
- (Heinz et al., 2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell.
Institution(s)
Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA; Department of Medicine, University of California, San Diego, La Jolla, CA, USA
Weeder
Falls into the motif enumeration family of motif discovery tools in which the occurrence of motifs in the query sequences are counted and, in this case, compared to a pre-calculated set of genome specific background motifs. This has the benefit of not having to construct a background set of sequences (no easy task). Weeder was initially used to identify common motifs in defined promoter regions, but evolved to consider first ChIP-chip and then ChIP-seq data.
Publications
- Pavesi et al., 2004) Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res.
- (Pavesi et al., 2006) MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes. Nucleic Acids Res.
- (Zambelli et al., 2014) Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences. Curr Protoc Bioinformatics.
Institution(s)
University of Milan, Italy
kmerHMM
Discovers DNA motifs on protein binding microarray (PBM) data. kmerHMM is a computational pipeline for PBM motif discovery in which hidden markov models (HMMs) are trained to model DNA motifs, and Belief Propagation is used to elucidate multiple motif models from each trained HMM. The software model the dependence between adjacent nucleotide positions and can also deduce multiple binding modes for a given transcription factor (TF).
Publications
- (Wong et al., 2013) DNA motif elucidation using belief propagation. Nucleic Acids Res.
Institution(s)
Department of Computer Science, University of Toronto, Toronto, ON, Canada; Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.