Genome Annotation


anno

Table of Contents

Genome Annotation

Genome annotation is a key process for identifying the coding and non-coding regions of a genome, gene locations and functions. Analysis of DNA sequence with genome annotation software tools allow finding and mapping genes, exons-introns, regulatory elements, repeats and mutations. Genome databases are essential to retrieve information on gene name, protein product and DNA sequence functions.


Gene Ontology Annotation

BLASTX

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold.

Different types of BLASTs are available according to the query sequences. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. The BLAST algorithm and program were designed by Stephen Altschul, Warren Gish, Webb Miller, Eugene Myers, and David J. Lipman at the National Institutes of Health and was published in the Journal of Molecular Biology in 1990 and cited over 50,000 times.

blast Input: Input sequences (in FASTA or Genbank format) and weight matrix.

Output: BLAST output can be delivered in a variety of formats. These formats include HTML, plain text, and XML formatting.

The details about process and algorithm coule be found here and official website


NCBI BLAST:

BLAST stands for Basic Local Alignment Search Tool.The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of your sequence.

This tool can be used in the following contexts:Protein, Nucleotide, Vectors.


PSI-BLAST:

Position specific iterative BLAST (PSI-BLAST) refers to a feature of BLAST 2.0 in which a profile is automatically constructed from the first set of BLAST alignments. PSI-BLAST is similar to NCBI BLAST2 except that it uses position-specific scoring matrices derived during the search, this tool is used to detect distant evolutionary relationships. PHI-BLAST functionality is available to use patterns to restrict search results.

How to use

Publications

Institution(s)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA

citation: Sequence Similarity Searching

Top


Blast2GO

Permits functional annotation, management, and data mining of novel sequence data. Blast2GO is based on the utilization of common controlled vocabulary schemas, the gene ontology (GO). It takes in consideration similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations. This tool is suitable for plant genomics research. It generates functional annotation and assesses the functional meaning of their experimental results.

blast

Official Website

Documentation

Publications:

  • (Conesa et al., 2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.
    PMID: 16081474 DOI: 10.1093/bioinformatics/bti610
  • (Conesa and Götz, 2008) Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics.
    PMID: 18483572 DOI: 10.1155/2008/619832
  • (Götz et al., 2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res.
    PMID: 18445632 DOI: 10.1093/nar/gkn176
  • (Aparicio et al., 2006) Blast2GO goes grid: developing a grid-enabled prototype for functional genomics analysis. Stud Health Technol Inform.
    PMID: 16823138

Institutions(s): Bioinformatics Department, Centro de Investigación Príncipe Felipe, Valencia, Spain

Top


GO Semantic Similarity Analysis

G-SESAME | Gene Semantic Similarity Analysis and Measurement Tools

A set of online tools for measuring the semantic similarities of Gene Ontology (GO) terms and the functional similarities of gene products, and for further discovering biomedical knowledge from the GO database. Visualization techniques are provided in these tools to allow users to inspect the locations of the GO terms within the GO graph and to visually determine the semantic similarity. A batch command interface is also provided for users to execute the tools to measure the semantic similarity of a group of GO terms or functional similarities of a group of genes. Web based APIs are also provided for advanced users.

Official Website

Publications:

Institutions(s): School of Computing, Clemson University, Clemson, SC, USA


clusterProfiler | GO semantic similarity analysis : Genome annotation

Automates the process of biological-term classification and the enrichment analysis of gene clusters. ClusterProfiler supports three species, including humans, mice, and yeast. It offers a gene classification method, namely groupGO, to sort genes based on their projection at a specific level of the gene ontology (GO) corpus. This tool is able to calculate enrichment test for GO terms and KEGG pathways based on hypergeometric distribution.

Official Website

Github

Documentation

Publications:

Institutions(s): State Key Laboratory of Emerging Infectious Diseases and Centre of Influenza Research, School of Public Health, The University of Hong Kong, Hong Kong, China


GOToolBox | GO semantic similarity analysis : DNA sequence analysis

Provides a series of programs allowing the functional investigation of groups of genes, based on the Gene Ontology resource. GOToolBox allows 1) the identification of statistically relevant over- or under-represented terms in a gene dataset, 2) the clustering of functionally related genes within a set and 3) the retrieval of genes sharing annotations with a query gene. The user can also constrain the GO annotations to a slim hierarchy or to a given level of the ontology, in order to facilitate the interpretation of the results.

Official Website

Publications:

Institutions(s): Laboratoire de Génétique et Physiologie du Développement, IBDM, CNRS/INSERM/Université de la Méditerranée, Parc Scientifique de Luminy, Marseille, France


SML-Toolkit | Semantic Measures Library Toolkit

A Java Toolkit dedicated to semantic measures computation and analysis. SML-Toolkit is composed of various tools related to semantic measures computation and analysis. Those tools are provided through a common command-line interface.

Official Website

Github

Publications:

Institutions(s): LGI2P/EMA Research Centre, Site EERIE, Parc Scientifique G Besse, Nîmes, France

Top


Promoter Predication

FirstEF | First Exon Finder

A 5’ terminal exon and promoter prediction program. FirstEF consists of different discriminant functions structured as a decision tree. The probabilistic models are optimized to find potential first donor sites and CpG-related and non-CpG-related promoter regions based on discriminant analysis. For every potential first donor site (GT) and an upstream promoter region, FirstEF decides whether or not the intermediate region can be a potential first exon, based on a set of quadratic discriminant functions.

Official Website

Documentation

Publications:

Institutions(s): Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, NY, USA


CpGpromoter

A program for a large-scale human promoter mapping using CpG islands. CpGpromoter is based on results of discriminant analysis between the promoter-associated CpG islands and non-associated ones. It enables an efficient mapping of human promoters with 2Kb resolution, if there is a CpG island inside an interval (-500…+1,500) around a transcription start site.

Official Website

Publications:

Institutions(s): Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA


PROmiRNA

A program for annotating miRNA promoters in human, as well as other species. PROmiRNA uses deepCAGE data from the FANTOM4 Consortium and integrated cage tag counts and other promoter features, such as CpG content, conservation and TATA box affinity, to score the potential of a candidate region to be a promoter. Given a list of genomic regions of interest, in the form of a gff file, PROmiRNA returns the most probable promoter locations, together with the posterior probabilities calculated by the model.

Official Website

Publications:

Institutions(s): Max Planck Institute for Molecular Genetics, Berlin, Germany; Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China

Top


Annotation Workflows

RAST | Rapid Annotation using Subsystem Technology

Assists in annotating complete or nearly complete bacterial and archaeal genomes. RAST is a fully-automated application provides high quality genome annotations for these genomes across the whole phylogenetic tree. It includes a user interface that allows registered users to make manual changes to their genomes before retrieving them. It was designed to extend annotations to as many protein-encoding genes in as many genomes as possible.

Official Website

Publications:


Prokka | Annotation workflows

A command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka uses parallel processing to decrease running time on multicore computers. The most time-consuming steps are BLAST+ and hmmscan, which both support multiple CPUs natively. However, Prokka is more efficient if it runs multiple single CPU threads on subsets of the data, which it achieves using GNU parallel.

Official Website

Publications:

Institutions(s): Victorian Bioinformatics Consortium, Monash University, Clayton; Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative, Carlton, Australia


BlastKOALA

Assigns K numbers to the user’s sequence data by BLAST searches, respectively, against a nonredundant set of KEGG GENES. KOALA (KEGG Orthology And Links Annotation) is KEGG’s internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation. Annotate Sequence in KEGG Mapper and Pathogen Checker in KEGG Pathogen are special interfaces to this server and can be executed in an interactive mode. BlastKOALA is suitable for annotating fully sequenced genomes.

Official Website

Publications:

Institutions(s): Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan; Healthcare Solutions Department, Fujitsu Kyushu Systems Ltd, Hakata-ku, Fukuoka, Japan; Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan

Top


Find more tools: OMICTOOLS

Image Citation


Related