LLMpediaThe first transparent, open encyclopedia generated by LLMs

Primary Structures

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 71 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted71
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Primary Structures
NamePrimary structure
CaptionExample of an amino acid sequence
FieldBiochemistry
Introduced20th century

Primary Structures

Primary structures denote the linear sequence of monomeric units in a biopolymer, most commonly the sequence of amino acids in a polypeptide chain. This sequence determines higher-order folding and function and is central to studies in James Watson, Francis Crick, Linus Pauling, Frederick Sanger, and institutions such as the National Institutes of Health, Max Planck Society, and Cold Spring Harbor Laboratory. Research in this area intersects with methods developed at Massachusetts Institute of Technology, Stanford University, Harvard University, University of Cambridge, and organizations like the European Molecular Biology Laboratory.

Definition and Overview

In proteins, the term refers to the specific order of amino acid residues from the N-terminus to the C-terminus established by peptide bonds. Foundational work by Frederick Sanger and sequencing efforts at Cambridge University and MRC Laboratory of Molecular Biology defined techniques that revealed sequences underlying enzymes, hormones, and structural proteins. Determination of primary sequences enabled molecular characterization of key biomolecules studied by researchers at University of Oxford, Weizmann Institute of Science, and Yale University, and facilitated the creation of sequence databases maintained by institutions such as the National Center for Biotechnology Information and the European Bioinformatics Institute.

Molecular Basis and Formation

The chemical basis for a primary sequence is covalent peptide bond formation between amino and carboxyl groups of amino acids, a process catalyzed cotranslationally by the ribosome in organisms studied by groups at Max Planck Institute for Molecular Genetics and Rockefeller University. Messenger RNAs transcribed by enzymes associated with RNA polymerase II encode codon arrangements read by transfer RNAs charged by aminoacyl-tRNA synthetases characterized in research at University of California, San Francisco. Post-translational modifications occurring in the endoplasmic reticulum and Golgi apparatus in cells from organisms investigated at Salk Institute and Johns Hopkins University can further alter residues after peptide bond formation. Synthetic peptide chemistry pioneered by researchers at ETH Zurich and companies like Genentech allows laboratory assembly of sequences using solid-phase peptide synthesis and native chemical ligation.

Functional Roles in Proteins

The linear sequence dictates local propensity for secondary motifs like alpha helices and beta sheets derived from patterns first analyzed by Linus Pauling and others at Caltech. Primary sequences determine active-site composition in enzymes such as those studied at Max Planck Institute for Biochemistry and binding specificity in receptors characterized at Imperial College London and Scripps Research. Sequence motifs guide cellular targeting via signal peptides identified in work at University of California, Berkeley and mediate interactions in multiprotein complexes investigated at Cold Spring Harbor Laboratory. Therapeutic biologics developed by firms like Amgen and Roche rely on engineered sequences to modulate pharmacokinetics and immunogenicity profiles.

Methods of Determination

Classical sequencing of proteins was established by researchers at University of Cambridge and MRC, with modern proteomics dominated by mass spectrometry platforms refined at University of Washington and University of Manchester. Edman degradation, developed by Pehr Edman and applied at laboratories including Carlsberg Laboratory, provided early residue-by-residue information. High-throughput nucleotide sequencing technologies from companies like Illumina and institutions such as the Broad Institute enable inference of protein sequences from genomic data produced by projects like the Human Genome Project and the 1000 Genomes Project. Tandem mass spectrometry, cryo-electron microscopy workflows from EMBL and computational prediction methods advanced at DeepMind and University of Toronto complement experimental determination.

Variations, Mutations, and Disorders

Single-residue substitutions, insertions, deletions, and frameshifts in coding sequences alter primary sequences and underlie diseases characterized by groups at National Cancer Institute, Mayo Clinic, and Cleveland Clinic. Classic examples include substitutions causing enzyme deficiencies studied in families at Johns Hopkins Hospital and mutations in receptors linked to syndromes investigated at Mount Sinai Hospital. Mutational analyses by consortia such as The Cancer Genome Atlas and ClinVar annotate pathogenic variants that change sequence-derived features, affecting folding pathways elucidated in studies at University of California, San Diego and Washington University in St. Louis. Therapies like enzyme replacement and sequence-correcting approaches developed at Vertex Pharmaceuticals and academic partners target primary-sequence defects.

Evolutionary and Comparative Aspects

Comparative sequencing across species performed by teams at Smithsonian Institution, Zoological Society of London, and the Wellcome Sanger Institute tracks conservation and divergence of sequences to infer phylogenetic relationships used by evolutionary biologists at University of Chicago and Princeton University. Conserved motifs in orthologous proteins identified in studies at ETH Zurich and University of Edinburgh reveal functional constraints, while adaptive changes highlighted in work on Darwin’s finches and organisms cataloged by the Natural History Museum, London demonstrate selection on sequence variants. Large-scale comparative projects coordinated by UniProt and Ensembl provide resources for mapping sequence evolution and correlating primary-sequence changes with phenotypic shifts studied at Carnegie Institution for Science.

Category:Biochemistry