Table of Contents
RNA Modifications Analysis
From a biosynthetic point of view, the roughly 100 known different RNA nucleotide modifications can be easily divided according to their complexity. There are simple chemical transformations, such as addition of a methyl group or bond isomerization of uridine to yield pseudouridine, and there are more complex multistep transformations involving the action of several enzymes in a defined order. source
Protein-binding Region Prediction
RNAcompete
Allows the study of RNA-protein interactions. RNAcompete provides an estimate of relative preference for a large number of individual sequences using a single binding reaction. It also permits to design arrays that focus on variants of a specific type of sequence and structure. Finally, this method can be used to reliably identify preferred binding sequences for RNA-binding proteins (RBPs), whether these are in structured or unstructured RNA contexts.
Publications:
- (Ray et al., 2009) Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat Biotechnol.
Institutions(s):
Banting and Best Department of Medical Research, University of Toronto, Toronto, ON, Canada; Department of Computer Science, University of Toronto, Toronto, ON, Canada
catRAPID omics
A server for large-scale calculations of protein-RNA interactions. catRAPID omics allows (i) predictions at proteomic and transcriptomic level; (ii) use of protein and RNA sequences without size restriction; (iii) analysis of nucleic acid binding regions in proteins; and (iv) detection of RNA motifs involved in protein recognition.
Publications:
- (Agostini et al., 2013) catRAPID omics: a web server for large-scale prediction of protein-RNA interactions. Bioinformatics.
- (Bellucci et al., 2011) Predicting protein associations with long noncoding RNAs. Nat Methods.
Institutions(s):
Gene Function and Evolution, Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and Universitat Pompeu Fabra (UPF), Barcelona, Spain
RBPmap
A web server for accurate prediction and mapping of RBP binding sites. RBPmap has been developed specifically for mapping RBPs in human, mouse and Drosophila melanogaster genomes, though it supports other organisms too. RBPmap enables the users to select motifs from a large database of experimentally defined motifs. In addition, users can provide any motif of interest, given as either a consensus or a PSSM.
Publications:
- (Paz et al., 2014) RBPmap: a web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res.
Institutions(s):
Department of Biology, Technion - Israel Institute of Technology, Technion City, Haifa, Israel
Alternative Splicing Annotation
APPRIS
The main goal of developing the APPRIS WebServer and WebServices is to allow users to annotate splice isoforms and select a principal isoform for vertebrate genome species beyond those that are annotated in the APPRIS Database, to annotate genes and variants that are missing from the APPRIS Database, and to annotate their experimental results with existing annotations. The APPRIS WebServer has been designed to be used for the comparison of splice isoform annotations for individual genes, while the APPRIS WebServices have been created to allow access to the APPRIS Database and to run an automatic version of the APPRIS server, using REST architecture to be portable, modular and flexible in the automation of programmatic scripts.
Publications:
- (Rodriguez et al., 2015) APPRIS WebServer and WebServices. Nucleic Acids Res.
- (Rodriguez et al., 2013) APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res.
Institutions(s):
Spanish National Bioinformatics Institute (INB) and Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
rPGA
Allows users to discover hidden splice junctions by mapping personal RNA-seq data to the matching personal genome sequence. rPGA personalizes the reference genome according to an individual’s single nucleotide polymorphisms (SNPs) and then maps the individual’s transcriptome to the corresponding personal genome, and discovers novel splice variants specific to the individual. This software was applied to analyze RNA-seq data from individuals with whole-genome genotype data in the 1000 Genomes project.
Publications:
- (Stein et al., 2015) Discover hidden splicing variations by mapping personal transcriptomes to personal genomes. Nucleic Acids Res
- (Stein et al., 2017) Using RNA-Seq to Discover Genetic Polymorphisms That Produce Hidden Splice Variants. Methods Mol Biol.
Institutions(s):
Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, USA
Exogean
Predicts transcripts human mRNA and mouse protein sequence alignments. Exogean enables prediction of several alternative transcripts per gene. It can be useful for annotation of eukaryote protein coding genes based on alignments with proteins from a different species and/or mRNAs from the same species. This tool produces information on each predicted gene and transcript that summarizes their structure, the evidence used, the problems and conflicts encountered and the solutions applied.
Publications:
- (Djebali et al., 2006) Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA. Genome Biol.
Institutions(s):
Dyogen Lab, CNRS UMR8541, Ecole Normale Superieure, Paris, France; IBISC Lab, CNRS FRE2873, Universite d’Evry Val d’Essonne, Genopole, Evry, France
RNA Methylation Prediction
SRAMP
A mammalian m6A sites predictor. SRAMP achieves promising performance both in cross-validation tests on its training dataset, and in the rigorous independent tests. Another highlighting trait of this predictor is that only RNA sequences are required when running a prediction and no external -omics data are loaded. With either kind of input sequence, SRAMP achieves competitive performance in both cross-validation tests and rigorous independent benchmarking tests. SRAMP serves as a useful tool to predict m6A modification sites on the RNA sequences of interests.
Publications:
- (Zhou et al., 2016) SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res.
Institutions(s):
Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
iRNAm5C-PseDNC
Predicts the identifying RNA 5-methylcytosine modification sites. iRNAm5C-PseDNC is a web-server and a predictor developed by incorporating ten types of physical-chemical properties into pseudo dinucleotide composition. To obtain the predicted result with the anticipated success rate, the user have to employ the entire sequence of the query RNA rather than its fragment as an input.
Publications:
- (Qiu et al., 2017) iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget.
Institutions(s):
Department of Computer Science and Bond Life Science Center, University of Missouri, Columbia, MO, USA
m6Apred
A support vector machine based-method is proposed to identify m(6)A sites in Saccharomyces cerevisiae genome. In this model, RNA sequences are encoded by their nucleotide chemical property and accumulated nucleotide frequency information. It is observed in the jackknife test that the accuracy achieved by the proposed model in identifying the m(6)A site was 78.15%.
Publications:
- (Chen et al., 2015) Identification and analysis of the N(6)-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci Rep.
Institutions(s):
Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, Hebei United University, Tangshan, China
Methylated RNA Immunoprecipitation Sequencing Data Analysis (MeRIP-seq Analysis)
RNA modifications, especially methylation of the N6 position of adenosine (m6A), represent an emerging research frontier in RNA biology. Along with the development of MeRIP-Seq (Meyer et al., 2012) also called m6A-seq (Dominissini et al., 2012), researchers are knowable to carry out in-depth studies on m6A distribution and function of related genes.
Spliced Read Alignment
TopHat
Aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie. TopHat also analyzes the mapping results to identify splice junctions between exons. It can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. The tool combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes.
Publications:
- (Kim et al., 2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol.
- (Trapnell et al., 2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics.
Institutions(s):
Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA; Department of Computer Science, University of Maryland, College Park, MD, USA
HISAT | Hierarchical Indexing for Spliced Alignment of Transcripts
A fast and sensitive spliced alignment program for mapping RNA-seq reads. In addition to one global FM index that represents a whole genome, HISAT uses a large set of small FM indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome).
Publications:
- (Kim et al., 2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods.
Institutions(s):
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Subread
Assists users in mapping reads to a reference genome. Subread consists of a seed-and-vote step, that achieves local alignment simultaneously in multiple parts of the read. This strategy uses a large number of short equi-spaced seeds from each read, called subreads. It allows all the subreads to vote on the optimal location for the read.
Publications:
- (Liao et al., 2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics.
- (Liao et al., 2013) The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res.
Institutions(s):
Division of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
Peak Calling
ExomePeak | Peak calling : MeRIP-seq analysis
The package is developed for the analysis of affinity-based epitranscriptome shortgun sequencing data from MeRIP-seq (maA-seq).
Publications:
- (Meng et al., 2013) Exome-based analysis for RNA epigenome sequencing data. Bioinformatics.
Institutions(s):
Picower Institute for Learning and Memory, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, MA, USA
MeTPeak
A graphical model-based peak calling method for transcriptome-wide detection of m6A sites from MeRIP-seq data. MeTPeak models read count of m6A site and introduces a hierarchical layer of Beta variables to capture the variances and a Hidden Markov model (HMM) to characterize the reads dependency across a site. MetPeak prediction on real MeRIPseq datasets have suggested that it precisely recapitulates the motif and distribution of m6A sites, as well as correctly predicting the methylation differences among these methyltransferases.
Publications:
- (Cui et al., 2016) A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data. Bioinformatics.
Institutions(s):
Department of Electrical and Computer Engineering, University of Texas at San Antonio, TX, USA; Department of Biological Science, XI’an Jiaotong-Liverpool University, Suzhou, China
De novo motif discovery
Weeder
Falls into the motif enumeration family of motif discovery tools in which the occurrence of motifs in the query sequences are counted and, in this case, compared to a pre-calculated set of genome specific background motifs. This has the benefit of not having to construct a background set of sequences (no easy task). Weeder was initially used to identify common motifs in defined promoter regions, but evolved to consider first ChIP-chip and then ChIP-seq data.
Publications:
- (Pavesi and Pesole, 2006) Using Weeder for the discovery of conserved transcription factor binding sites. Curr Protoc Bioinformatics.
- (Zambelli et al., 2014) Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences. Curr Protoc Bioinformatics.
Institutions(s):
University of Milan, Italy
HOMER | Hypergeometric Optimization of Motif EnRichment
Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.
Publications:
- (Heinz et al., 2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell.
Institutions(s):
Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA; Department of Medicine, University of California, San Diego, La Jolla, CA, USA
MEME Suite
Provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms–MAST, FIMO and GLAM2SCAN–allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm TOMTOM. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and TOMTOM), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters.
Publications:
- (Bailey et al., 2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res.
Institutions(s):
Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
Degradome-seq Analysis
sPARTA package
A powerful tool for plant miRNA target prediction and PARE validation. It can search for targets in unannotated genomic regions, which is useful to discover novel regulatory modules, independent of genome annotations that may be incomplete.
Publications:
- (Kakrana et al., 2014) sPARTA: a parallelized pipeline for integrated analysis of plant miRNA and cleaved mRNA data sets, including new miRNA target-identification software. Nucleic Acids Res.
Institutions(s):
Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA; Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA; Department of Plant and Soil Sciences, University of Delaware, Newark, DE, USA
CleaveLand
A generalizable computational pipeline for the detection of cleaved miRNA targets from degradome data.
Publications:
- (Addo-Quaye et al., 2009) CleaveLand: a pipeline for using degradome data to find cleaved small RNA targets. Bioinformatics.
Institutions(s):
Department of Computer Science and Engineering and Department of Biology, Pennsylvania State University, University Park, PA, USA