Protein Sequence/Expression


anno

Table of Contents

Protein Sequence Analysis

Protein sequence analysis tools are used to predict specific functions, activities, origin, or localization of proteins based on their amino-acid sequence. Software tools are also used to analysis high-throughput proteomics data sequences obtained by mass-spectrometry.

Amino Acid Repeat Prediction

RADAR

A tool to identify short composition biased and gapped approximate repeats, as well as complex repeat architectures involving many different types of repeats in a query sequence. RADAR is useful because many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units.

Official Website

Documentation

Publications:

Institutions(s):

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK


REPRO

The program is able to recognize distant repeats in a single query sequence. The technique relies on a variation of the Smith-Waterman local alignment strategy to find non-overlapping top-scoring local alignments, followed by a graph-based iterative clustering procedure to delineate the repeat set(s) based on consistency of the pairwise top-alignments.

Official Website

Documentation

Publications:

Institutions(s):

European Molecular Biology Laboratory, Heidelberg, Germany

Top


Subcellular Localizaton Prediction

TargetP

Allows users to predict eukaryotic proteins location. TargetP is a web application that scores N-terminal pre-sequences in a submitted protein. The software indicates chloroplast transit peptide (cTP), mitochondrial targeting peptide (mTP) and secretory pathway signal peptide (SP) predicted localization. The application includes parameters which allow choosing between in Plants and Non-Plants version, personalized cutoffs and the possibility to determine cleavage sites.

Official Website

Documentation

Publications:

Institutions(s):

Stockholm Bioinformatics Center, Albanova, Stockholm University, Stockholm, Sweden; Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark


pRoloc

Provides machine learning and visualization methods for interrogating and analyzing on quantitative mass spectrometry (MS) data to infer protein sub-cellular localization. PRoloc is suited for spatial proteomics data analysis provided as an R package that performs sub-cellular localization prediction from experimental and condition-specific MS-based quantitative proteomics data. The software allows classification of proteins to tens of sub-cellular compartments.

Official Website

Documentation

Publications:

Institutions(s):

Computational Proteomics Unit, Department of Biochemistry, University of Cambridge, Cambridge, UK

Top


Protein Clustering

CD-HIT

A widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques has been developed to allow efficient clustering of such datasets.

Official Website

Publications:

Institutions(s):

Center for Research in Biological Systems, University of California San Diego, La Jolla, CA, USA


USEARCH

Searches and clusters algorithms that can be orders of magnitude. USEARCH is a sequence analysis software which combines different algorithms into a single package. This software searches in database for top global hits and provides several NGS read processing features such as dereplication, paired read overlapping, quality filtering, FASTQ file statistics or chimeric sequence filtering.

Official Website

Galaxy

Publications:

Institutions(s):

Department of Micro- and Nanotechnology, Technical University of Denmark, Lyngby, Denmark

Top


MS-based (Mass Spectrometry) untargeted Proteomics

A number of technologies can be used to study proteomes, but arguably none is more powerful than mass spectrometry. There are two fundamentally different MS-based strategies for analyzing proteomes: discovery-based identification and targeted quantification (Doerr, 2013; Picotti and Aebersold, 2012). With a discovery-based strategy, the goal is usually to identify as many proteins as possible. The goal of a targeted proteomics experiment is to monitor a select few proteins of interest with high sensitivity, reproducibility and quantitative accuracy. Wikipedia

2D gel Image Analysis

digeR

Provides a graphical user interface (GUI) for the visualization of protein post-translational modification (PTM) changes between different biological states. digeR is an R package for analyzing potential protein PTM changes in 2D-DIGE (2D gel) study using spots correlation. It can also be used to support other ‘omic’ data analysis in a similar manner and to assist biologist to look for panels of biomarkers which would improve the diagnosis and prognosis of the disease.

Official Website

Publications:

Institutions(s):

UCD School of Medicine and Medical Science, University College Dublin, Dublin, Ireland; UCD Conway Institute of Biomolecular and Biomolecular Research, University College Dublin, Dublin, Ireland


Plasmo2D

Performs gel alignment and protein identification computationally. Plasmo2D assists in the identification of proteins in the P. falciparum 3D7 proteome from the 2-DE gel image. It allows users to overlap the markers of the gel image on the marker provided by the software using the image resizes buttons in the software. This too can display a list of potential candidate proteins corresponding to a spot of interest.

Official Website

Publications:

Institutions(s):

Department of Biochemistry, Indian Institute of Science, Bangalore, India

Top


In Silico Analysis

OpenMS

Allows to manage and analyse Liquid chromatography coupled to mass spectrometry (LC-MS) data. OpenMS is a programming library and tool collection integrated into full-featured workflow systems, such as KNIME, Galaxy and WS-PGRADE, to facilitate bioinformatics research in the field of MS on all levels. The software provides pre-built and ready-to-use tools for analysis of both proteomics and non-targeted metabolomics data.

Official Website

Github

Publications:

Institutions(s):

Applied Bioinformatics, Department for Computer Science, University of Tuebingen, Tuebingen, Germany; Center for Bioinformatics, University of Tuebingen, Tuebingen, Germany


multiplierz

Manages proteomic mass spectrometry workflows and data analysis. Multiplierz provides a toolset of multiple methods for peptide identification, quantitation, reporting, as well as tools for easily manipulating standard data formats. This software is a Python library compatible with new reporting formats and high-level tools to achieve post-perform proteomic analyses. The architecture of the software environment has seamless integration with native data files via mzAPI.

Official Website

Documentation

Publications:

Institutions(s):

Department of Cancer Biology and Blais Proteomics Center, Dana- Farber Cancer Institute, Boston, MA, USA

Top


Phosphorylation Site Assignment

Ascore

Processes localization-specific probability for every phosphorylation site within a data set. Ascore allows users to detect peptides, shows the degree of certainty and the non-localized peptides. This software can determine the proper phosphorylation site alignment by calculating the differences in site placement at the level of the site-determining ions. It avoids the potential inconsistencies that may happen from manual validation.

Official Website

Publications:

Institutions(s):

Department of Cell Biology, Harvard Medical School, Boston, MA, USA; Department of Genetics and Norris Cotton Cancer Center, Lebanon, New Hampshire, NH, USA


Phosphonormalizer

Provides pairwise normalization of phosphoproteomics data. Phosphonormalizer allows analysis of large-scale experiments in various treatment conditions. It incorporates non-enriched data as a reference for normalizing the enriched data. This tool can be applied to studies of other post-translational modifications (PTMs) than phosphorylation that are commonly studied using enrichment and that can also be detected in non-enriched samples with sensitive methods.

Official Website

Publications:

Institutions(s):

Turku Centre for Biotechnology, University of Turku and Åbo Akademi, Turku, Finland; Department of Future Technologies, University of Turku, Turku, Finland

Top


PTM Identification

PIQED

Provides a complete, automated workflow for post translational modification (PTM) identification, quantification, and statistical testing from exclusively data-independent acquisition-mass spectrometry (DIA-MS) data. PIQED is a workflow and open-source software that enables a two-fold reduction of acquisition time because both identification and quantification are achieved with a single DIA analysis. PTM-specific capabilities of this package include site localization scoring and filtering, peptide consolidation to modification site-level, and optional local or global total-ion chromatogram (TIC) normalization.

Official Website

Documentation

Publications:

Institutions(s):

Buck Institute for Research on Aging, Novato, CA, USA; Department of Pathology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA


SUMmOn

An automated pattern recognition tool that detects diagnostic PTM fragment ion series within complex MS/MS spectra, to identify modified peptides and modification sites within these peptides. The results of the SUMmOn analysis are stored in an XML instance document. A dynamically generated XSLT style sheet is then used to create an HTML file that is formatted via CSS (cascading style sheet), and viewed in a web-browser.

Official Website

Publications:

Institutions(s):

Institute for Systems Biology, Seattle, WA, USA

Top


Glycosylation Identification

SugarQb

Enables genome-wide insights into protein glycosylation and glycan modifications in complex biological systems. SugarQb identifies intact glycopeptides, and maps and quantifies changes in protein glycosylation at a proteome scale. This method can assist user in investigation of the glycoproteome of mouse embryonic stem cells.

Official Website

Documentation

Publications:

Institutions(s):

IMBA, Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Vienna, Austria; Institute of Molecular Pathology (IMP), Vienna, Austria


MAGIC

Identifies intact N-glycosylated peptides from a public protein database without requiring any prior information of proteins or glycans. MAGIC aims to support untargeted glycopeptide analysis while the newly implemented MAGIC+ is designed to perform targeted glycopeptide analysis that allows users to upload their own protein sequence file to find glycopeptides in their data. The search results from Mascot can be integrated with the results from MAGIC-web via Reports Integrator to generate a complete protein/peptide-glycan summary report. Independent of the above three modules, the fourth module, Glycan Search, allows users to find various glycans from a large glycan database stored in the web server, regardless the types of glycosylation. MAGIC-web has a user-friendly visualization interface for easy data uploading and processing, and result interpretation. MAGIC-web is free and open to all users and there is no log in requirement.

Official Website

Publications:

Institutions(s):

Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan; Institute of Information Science, Academia Sinica, Taipei, Taiwan

Top


MS-based Metaproteomics

Unipept

An open source web application that is designed for metaproteomics analysis with a focus on interactive data-visualization. Unipept is underpinned by a fast index built from UniProtKB and the NCBI taxonomy that enables quick retrieval of all UniProt entries in which a given tryptic peptide occurs. Unipept provides programmatic access to the metaproteomics analysis features. This enables integration of Unipept functionality in custom applications and data processing pipelines.

Official Website

Publications:

Institutions(s):

Department of Applied Mathematics, Computer Science and Statistics, Faculty of Sciences, Ghent University, Ghent, Belgium


MPA

An intuitive open-source tool for metaproteomics data analysis and interpretation, which includes multiple search engines and the feature to decrease data redundancy by grouping protein hits to so-called meta-proteins. The functionality of the MetaProteomeAnalyzer is demonstrated using a sample of a microbial community taken from a biogas plant.

Official Website

Documentation

Publications:

Institutions(s):

Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany

Top


Antibody Array Analysis

PANDA

A web-based software program developed at Emory University for analyzing phosphorylation antibody arrays.

Official Website

Publications:

Institutions(s):

The Winship Cancer Institute, Emory University School of Medicine, Atlanta, Georgia, USA

Top


Find more tools: OMICTOOLS

Image Citation


Related