CAZy — LLMpedia

CAZy
Name	CAZy
Type	Biological database
Founded	1999
Founder	Bernard Henrissat
Discipline	Glycobiology
Country	France
Language	English

Contents

Overview
Classification and nomenclature
Database structure and content
Methods and annotation criteria
Applications and impact in research
History and development

CAZy

The CAZy resource is a specialized online database for carbohydrate-active enzymes, integrating curated classification, sequence families, and biochemical information about glycoside hydrolases, glycosyltransferases, polysaccharide lyases, carbohydrate esterases, and auxiliary activities. It serves researchers across molecular biology, microbiology, biochemistry, structural biology, and biotechnology by linking enzyme families to sequences, structures, taxonomic sources, and literature metadata.

Overview

CAZy catalogs enzyme families implicated in carbohydrate metabolism and modification, aligning enzyme sequences with experimentally characterized proteins from sources such as European Molecular Biology Laboratory, National Center for Biotechnology Information, UniProt, Protein Data Bank, Swiss-Prot, EMBL-EBI, and GenBank. The resource interfaces with model organism databases including Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Informatics, and Arabidopsis Information Resource while cross-referencing taxonomy from NCBI Taxonomy and functional ontologies like Gene Ontology. It is heavily used by investigators at institutions such as Max Planck Society, Cold Spring Harbor Laboratory, Massachusetts Institute of Technology, Harvard University, Stanford University, University of Cambridge, University of Oxford, and Institut Pasteur.

Classification and nomenclature

CAZy organizes enzymes into families based on sequence similarity and catalytic mechanism, adopting nomenclature that parallels systems used by Enzyme Commission when possible and complementing classifications from MEROPS for proteases and IUBMB conventions for catalytic activity. Family names map to curated groups of glycoside hydrolases (GH), glycosyltransferases (GT), polysaccharide lyases (PL), carbohydrate esterases (CE), and auxiliary activities (AA), with family boundaries informed by comparisons to entries in databases like Pfam, InterPro, TIGRFAMs, COG, and OrthoDB. The classification connects to experimental characterizations published in journals including Nature, Science, Cell, Journal of Biological Chemistry, Biochemistry (journal), and Glycobiology (journal), and to enzyme mechanisms discussed by researchers at CNRS, European Molecular Biology Laboratory, Scripps Research, Johannes Gutenberg University Mainz, and ETH Zurich.

Database structure and content

The CAZy database stores family pages, sequence lists, accession mappings, and links to structural entries at Protein Data Bank, with curated annotations that reference primary literature from publishers such as Oxford University Press, Springer Nature, Elsevier, and Wiley-Blackwell. Data provenance traces to sequence submitters at GenBank, functional annotations in UniProtKB, and structural coordinates deposited by research groups at RCSB PDB. Taxonomic coverage spans kingdoms represented in projects like the Human Microbiome Project, Earth Microbiome Project, Genome 10K, and initiatives by JGI and ENCODE, while biochemical attributes integrate assays reported by laboratories at Max Planck Institute for Terrestrial Microbiology, Wageningen University, INRAE, and Lawrence Berkeley National Laboratory.

Methods and annotation criteria

CAZy curators apply sequence similarity thresholds, alignments, and phylogenetic inference using tools from BLAST, HMMER, Clustal Omega, MAFFT, MUSCLE, RAxML, IQ-TREE, and PhyML to delineate families and subfamilies, and reference structure-based classifications employing software such as DALI, TM-align, PyMOL, COOT, and Phenix. Annotation decisions integrate experimental evidence from enzyme kinetics studies by groups at University of São Paulo, Kyoto University, University of California, Berkeley, University of British Columbia, and University of Tokyo as well as biochemical standards articulated by IUBMB Enzyme Nomenclature and metabolomics platforms at Metabolomics Society. Curation also accounts for horizontal gene transfer cases documented in research from Broad Institute, Wellcome Sanger Institute, Max Planck Institute for Evolutionary Anthropology, and phylogenomic frameworks from Tree of Life Web Project.

Applications and impact in research

Researchers in bioenergy, microbiome science, structural enzymology, and industrial biotechnology utilize CAZy data for enzyme discovery, protein engineering, and pathway reconstruction, informing projects at DOE Bioenergy Research Centers, Industrial Biotechnology Innovation Centre, Novozymes, DuPont, BASF, Cargill, and synthetic biology groups at Addgene and iGEM Foundation. CAZy-derived family assignments underpin metagenomic analyses in studies published by teams at European Bioinformatics Institute, Broad Institute, J. Craig Venter Institute, PLOS ONE, and Proceedings of the National Academy of Sciences, and support structure-function research cited in Nature Communications, ACS Catalysis, Biotechnology for Biofuels, and Applied and Environmental Microbiology. The resource has facilitated development of enzyme assays, thermostable biocatalysts used by DSM-Firmenich, and carbohydrate-processing applications in pharmaceutical industry settings such as GlaxoSmithKline and Roche.

History and development

CAZy began as a curated classification effort led by researchers affiliated with CNRS and Université Aix-Marseille and expanded through collaborations with groups at EMBL-EBI, RCSB PDB, and NCBI, evolving alongside sequencing initiatives like the Human Genome Project, Global Ocean Sampling Expedition, and bacterial genomics efforts at Wellcome Sanger Institute. Its evolution parallels method developments from laboratories at European Molecular Biology Laboratory, Max Planck Institute for Biophysical Chemistry, Howard Hughes Medical Institute, Institut Pasteur, and University of California, San Diego, while community contributions and citations have appeared in venues including Nature Reviews Microbiology, Annual Review of Biochemistry, Trends in Biotechnology, and conference proceedings of the Gordon Research Conferences and International Glycoconjugate Organization.

Category:Biological databases