Ensembl Variant Effect Predictor

Ensembl Variant Effect Predictor
Name	Ensembl Variant Effect Predictor
Developer	European Molecular Biology Laboratory European Bioinformatics Institute
Released	2010
Programming language	Perl (programming language)
Operating system	Linux, macOS, Microsoft Windows
Genre	Bioinformatics Genomics Variant annotation

Contents

Overview
Features and Functionality
Input Formats and Annotation Outputs
Supported Species and Data Sources
Performance and Scalability
Use Cases and Applications
Development, Extensibility, and Integration

Ensembl Variant Effect Predictor is a bioinformatics tool developed by the European Molecular Biology Laboratory European Bioinformatics Institute team to predict the effects of genomic variants on genes, transcripts, and proteins. The tool is widely used by researchers at institutions such as Wellcome Sanger Institute, Broad Institute, National Institutes of Health, and Cold Spring Harbor Laboratory to annotate single nucleotide variants, insertions, deletions, and structural variants in cohorts studied by projects like the 1000 Genomes Project, UK Biobank, and the Cancer Genome Atlas. It interfaces with major resources including Ensembl, GENCODE, RefSeq, dbSNP, and ClinVar to provide standardized consequence predictions for clinical and research workflows.

Overview

Ensembl Variant Effect Predictor (VEP) provides consequence annotation by mapping variant coordinates to features from databases such as Ensembl and GENCODE. It integrates population allele frequencies from resources like gnomAD and Exome Aggregation Consortium and clinical assertions from ClinVar and HGMD to support interpretation in contexts ranging from rare disease studies at European Nucleotide Archive contributors to cancer genomics consortia like the International Cancer Genome Consortium. VEP is distributed as a command-line tool, a REST API used by services like UCSC Genome Browser mirrors, and a web interface consumed by platforms including Galaxy (web platform) and DNAnexus.

Features and Functionality

VEP annotates consequences using consequence terms aligned with the Sequence Ontology and reports effects on coding sequences, splice sites, regulatory regions, and non-coding RNAs cataloged by resources such as Ensembl Regulatory Build and miRBase. It predicts amino-acid changes and integrates pathogenicity scores from tools like SIFT (software), PolyPhen-2, CADD (tool), REVEL, and MetaSVM. VEP supports transcript prioritization strategies used by ClinGen, ACMG-aligned pipelines, and clinical labs such as Mayo Clinic and Johns Hopkins Hospital. The tool also annotates conservation metrics from databases like PhastCons and PhyloP used in evolutionary studies at institutions like Max Planck Institute and Harvard Medical School.

Input Formats and Annotation Outputs

VEP accepts common genomic data formats produced by projects such as 1000 Genomes Project and sequencing centers at Wellcome Sanger Institute: Variant Call Format (VCF), GVF, and simple tab-delimited lists. Outputs include enriched VCF, JSON, and tab-delimited reports tailored for integration with pipelines at Broad Institute and European Genome-phenome Archive. Annotation fields commonly reported mirror entries from dbNSFP, dbSNP, ClinVar, and transcript metadata from Ensembl and RefSeq. VEP also outputs HGVS nomenclature used by clinical reporting in organizations like American College of Medical Genetics and Genomics and laboratory networks such as EuroGentest.

Supported Species and Data Sources

VEP supports annotation for model organisms and species cataloged by Ensembl and Ensembl Genomes, including Homo sapiens, Mus musculus, Arabidopsis thaliana, Drosophila melanogaster, Saccharomyces cerevisiae, and agricultural species studied at International Rice Research Institute and USDA. Data sources integrated into VEP releases include Ensembl gene annotation, GENCODE, RefSeq, allele frequency datasets from 1000 Genomes Project, gnomAD, clinical datasets such as ClinVar and locus databases maintained by consortia like OMIM, and regulatory annotations from ENCODE and the Roadmap Epigenomics Project.

Performance and Scalability

VEP is designed for high-throughput environments at sequencing centers like Broad Institute, Wellcome Sanger Institute, and cloud platforms such as Amazon Web Services and Google Cloud Platform. Parallelization via multithreading and cache-based annotation strategies enable processing of millions of variants from cohorts like UK Biobank and population studies run by Human Genome Project affiliates. Benchmarking studies comparing VEP to tools from groups like GATK maintainers and developers at Illumina demonstrate trade-offs between annotation completeness and runtime; VEP’s modular architecture supports deployment on high-performance computing clusters at institutions like Argonne National Laboratory and Lawrence Berkeley National Laboratory.

Use Cases and Applications

VEP is used in clinical sequencing pipelines at hospitals such as Mayo Clinic and Johns Hopkins Hospital, research consortia like the International Cancer Genome Consortium and Psychiatric Genomics Consortium, and population-genetics projects such as 1000 Genomes Project and UK Biobank. Applications include rare disease gene discovery at centers like Genomics England, somatic mutation annotation in oncology initiatives run by Memorial Sloan Kettering Cancer Center, and agricultural genomics programs at CIMMYT. VEP outputs support downstream analysis with tools and databases like VCFtools, ANNOVAR, SNPeff, Bioconductor, Galaxy (web platform), and clinical interpretation frameworks such as ACMG guidelines.

Development, Extensibility, and Integration

VEP is developed by the European Molecular Biology Laboratory European Bioinformatics Institute with community contributions from groups at Wellcome Sanger Institute, Broad Institute, and academic labs worldwide. Its plugin architecture allows integration of third-party annotations from resources like dbNSFP, LOEUF constraint datasets curated by Genome Aggregation Database teams, and pathogenicity predictors developed at institutions such as Cold Spring Harbor Laboratory and Stanford University. APIs and code compatibility facilitate embedding VEP into platforms used by Google Health, DNAnexus, Illumina BaseSpace, and national infrastructures like ELIXIR and the National Center for Biotechnology Information service ecosystem.

Category:Bioinformatics