LLMpediaThe first transparent, open encyclopedia generated by LLMs

JGI Genome Portal

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Joint Genome Institute Hop 4
Expansion Funnel Raw 97 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted97
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
JGI Genome Portal
NameJGI Genome Portal
Established2004
TypeGenomic data portal
OwnerUS Department of Energy
LocationWalnut Creek, California

JGI Genome Portal

The JGI Genome Portal is a centralized online resource for genomic, metagenomic, transcriptomic, and functional data produced by the US Department of Energy Joint Genome Institute and affiliated projects. It aggregates sequence assemblies, annotations, and metadata to serve researchers across institutions such as the Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory, Los Alamos National Laboratory, Argonne National Laboratory, and university partners including University of California, Berkeley, Stanford University, and Massachusetts Institute of Technology. The portal supports collaborations with consortia such as the Human Microbiome Project, Earth Microbiome Project, and the 1000 Genomes Project-era initiatives, while integrating standards from organizations like the National Institutes of Health, National Science Foundation, and European Bioinformatics Institute.

Overview

The portal provides searchable catalogs of genomes from organisms studied by projects linked to DOE Office of Science, including plants (e.g., Arabidopsis thaliana, Zea mays), fungi (e.g., Neurospora crassa, Saccharomyces cerevisiae), bacteria (e.g., Escherichia coli, Pseudomonas fluorescens), archaea (e.g., Methanopyrus kandleri), and microbial communities from environments sampled in campaigns like Census of Marine Life and the International Census of Marine Microbes. It interfaces with databases such as GenBank, RefSeq, UniProt, KEGG, Pfam, and InterPro to provide cross-references, and adheres to metadata frameworks promoted by Global Biodiversity Information Facility and Genomic Standards Consortium.

History and Development

The portal evolved from early sequencing repositories at the Joint Genome Institute during initiatives funded by the Department of Energy in response to programs like the Human Genome Project and the Microbial Earth Project. Its development involved collaborations with computational centers such as National Energy Research Scientific Computing Center and software projects like BLAST, MAKER, GeneMark, Glimmer, and Augustus. Over time the portal incorporated pipelines influenced by workflows used in 1000 Genomes Project, ENCODE Project, The Cancer Genome Atlas, and metagenomics studies from Tara Oceans. Major milestones included integration of high-throughput platforms from vendors such as Illumina, Pacific Biosciences, and Oxford Nanopore Technologies, and adoption of standards from Open Biological and Biomedical Ontology and the MIxS specification.

Features and Data Resources

Resources available include whole-genome assemblies, gene models, functional annotations, expression datasets, and environmental metadata linked to sampling campaigns like Long Term Ecological Research and Critical Zone Observatory sites. The portal provides links to curated collections such as reference genomes for Arabidopsis halleri, model fungi from Fungal Genome Initiative, and microbial isolate assemblies connected to collections like American Type Culture Collection. Annotation pipelines incorporate models and databases from Swiss-Prot, TrEMBL, COG database, eggNOG, MetaCyc, Reactome, and BioCyc. The portal supports large-scale project pages for efforts akin to Plant 2030, BioEnergy Research Centers, and synthetic biology partnerships with institutions like Joint BioEnergy Institute.

Access and Tools

Users access data via web interfaces, command-line clients, and programmatic APIs modeled on standards used by European Nucleotide Archive and NCBI. Analytical tools integrated or linked include sequence search engines like BLAST and HMMER, genome browsers influenced by UCSC Genome Browser and Ensembl, and comparative platforms similar to OrthoMCL and Mauve. Workflow systems compatible with the portal are analogous to Galaxy Project, Cromwell, and Snakemake, while high-performance computing support leverages clusters at Oak Ridge Leadership Computing Facility and Lawrence Livermore National Laboratory. Training and documentation draw on materials from Cold Spring Harbor Laboratory and workshop networks including Gordon Research Conferences.

Usage and Impact

Researchers in fields associated with the Department of Energy—such as bioenergy, carbon cycling, and biogeochemistry—use the portal for projects comparable to Bioenergy Research Centers, National Microbiome Data Collaborative, and environmental genomics campaigns like Earth Microbiome Project and Tara Oceans. Findings supported by the portal have contributed to publications in journals such as Nature, Science, Cell, PNAS, and Genome Research, and informed policy discussions in forums like Biden administration science initiatives and advisory panels at the National Academies of Sciences, Engineering, and Medicine. The portal’s datasets have enabled downstream applications in synthetic biology collaborations with companies and institutes such as Amyris, Novozymes, and Joint BioEnergy Institute.

Governance and Funding

Governance involves oversight by the US Department of Energy Office of Science and coordination with national laboratories including Lawrence Berkeley National Laboratory and project partners such as Joint Genome Institute leadership and external advisory boards comprised of scientists from Harvard University, University of Washington, California Institute of Technology, and Johns Hopkins University. Funding sources include awards from the Department of Energy, cooperative agreements with National Laboratories, and project grants from agencies like the National Science Foundation and collaborative funding from foundations involved in life sciences infrastructure.

Category:Genome databases