LLMpediaThe first transparent, open encyclopedia generated by LLMs

SGML

Generated by DeepSeek V3.2
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: CSS Hop 4
Expansion Funnel Raw 38 → Dedup 18 → NER 14 → Enqueued 14
1. Extracted38
2. After dedup18 (None)
3. After NER14 (None)
Rejected: 4 (not NE: 4)
4. Enqueued14 (None)
SGML
NameStandard Generalized Markup Language
OwnerInternational Organization for Standardization
ReleasedOctober 1986
GenreMarkup language
Extended fromGML (IBM)
Extended toHTML, XML
StandardISO 8879

SGML. The Standard Generalized Markup Language is an international standard for defining descriptions of document structure and content. It provides a formal framework for the creation of markup languages, separating the logical structure of a document from its physical presentation. Developed from earlier projects at IBM, it became a foundational technology for electronic document processing and paved the way for the modern web.

History and development

The origins of the language can be traced to the work of Charles Goldfarb, Edward Mosher, and Raymond Lorie at IBM in the late 1960s, who created GML (IBM). This early system demonstrated the power of descriptive markup. Their efforts culminated in a proposal to the American National Standards Institute, which later evolved into an international project under the auspices of the International Organization for Standardization. After a lengthy development and review process involving experts from many nations, the standard was formally published as ISO 8879 in October 1986. Key contributions to its philosophy were also made by researchers like William W. Tunnicliffe and the Graphic Communications Association.

Technical overview

The specification defines a metalanguage for declaring document types using a Document Type Definition. A DTD formally defines the elements, attributes, and entities that can be used within a conforming document instance. A core concept is the separation of a document's logical structure, defined by tags, from its processing and presentation rules. The system supports complex features like marked sections, short references, and the management of external entities. Parsing an instance requires a specialized engine, often called an SGML parser, which validates the document against its DTD. The language's formal grammar is based on the concept of a generalized Backus–Naur form.

Relationship to other markup languages

This metalanguage served as the direct progenitor for many widely used formats. Tim Berners-Lee used its principles to create the initial HTML for the World Wide Web, with HTML 2.0 being formally defined as an application. Dissatisfaction with its complexity for web use led to the development of the simplified XML by the World Wide Web Consortium. Other notable derivatives include DocBook, used for technical documentation, and the Text Encoding Initiative guidelines for humanities research. Early versions of Adobe FrameMaker and Arbortext's software utilized its capabilities for publishing.

Applications and usage

For many years, it was the dominant standard for large-scale technical documentation and publishing in both government and industry. Major projects like the United States Department of Defense's Continuous Acquisition and Life-cycle Support initiative mandated its use for all technical manuals. Corporations such as Boeing, General Motors, and IBM used it to manage millions of pages of aircraft, automotive, and computer documentation. In the publishing world, it was central to the workflows of major academic publishers like Oxford University Press and news organizations like the Associated Press. Its use declined with the rise of XML and web-based technologies in the late 1990s.

Standardization and specifications

The primary governing document is the international standard ISO 8879:1986, maintained by ISO/IEC JTC 1/SC 34. This standard was technically corrected and amended over time, with notable additions like the Web SGML Technical Corrigendum to address issues encountered with HTML. The HyTime standard, ISO 10744, extended it for hypermedia and time-based documents. While the Internet Engineering Task Force published RFC 1870 on its use for mail, the World Wide Web Consortium became the primary body driving its evolution through related standards like the Document Style Semantics and Specification Language. Conformance testing was historically provided by organizations like the University of Leeds.