Markup languages — LLMpedia

Markup languages
Name	Markup languages
Paradigm	declarative, presentation, data serialization
First appeared	1960s–1990s
Designers	IBM, Xerox PARC, Tim Berners-Lee, Charles Goldfarb, Ray Tomlinson
File ext	.xml, .html, .xhtml, .markdown, .tex
Influences	SGML, GML, TeX
Influenced	HTML5, XML Schema, RSS, JSON-LD

Contents

Overview and Definitions
History and Evolution
Types and Examples
Syntax and Structure
Applications and Uses
Tools and Processing
Standards and Interoperability

Markup languages Markup languages are systems for annotating text to convey structure, semantics, and presentation. They originate in computing and publishing and have been adapted across web, document processing, and data interchange. Implementations connect to software ecosystems, protocols, and standards bodies that shape their development and interoperability.

Overview and Definitions

Markup systems provide tags, elements, or tokens to mark portions of a document so that rendering agents, parsers, or processors can interpret meaning. Early industrial work at IBM and research at Xerox PARC influenced subsequent innovations by individuals such as Tim Berners-Lee and Charles Goldfarb. Key specifications have been produced by organizations including the World Wide Web Consortium, the International Organization for Standardization, and the Internet Engineering Task Force. Implementations appear in products from Microsoft Corporation, Apple Inc., Google LLC, and open-source projects like Apache Software Foundation distributions.

History and Evolution

Origins trace to typesetting and publishing systems such as TeX by Donald Knuth and generalized markup like GML (Generalized Markup Language) by Charles Goldfarb at IBM. The Standard Generalized Markup Language specified by ISO informed later web work by Tim Berners-Lee at CERN. Early email and protocol work by Ray Tomlinson and networked hypertext experiments at Xerox PARC influenced hypermedia formats; HTML emerged within the World Wide Web Consortium ecosystem. Subsequent milestones include XML standardization at W3C, adoption of RSS and Atom for syndication, and modern developments like HTML5 driven by the Web Hypertext Application Technology Working Group and browser vendors such as Mozilla Corporation and Google Chrome teams.

Types and Examples

Markup families span document, web, and domain-specific formats. Prominent document systems include LaTeX and Troff; web formats include HyperText Markup Language implementations supported by Mozilla Foundation and Microsoft Edge. Data-centric markups include Extensible Markup Language and profile-driven formats used in Office Open XML by Microsoft Office and OpenDocument by OASIS. Lightweight syntaxes such as Markdown (popularized through GitHub) and reStructuredText (used by Python documentation) coexist with specialized XML vocabularies like SVG (graphics), MathML (mathematics), and XBRL (finance). Syndication and metadata examples include Dublin Core, FOAF, RDFa, and Schema.org annotations indexed by Bing and Google Search crawlers.

Syntax and Structure

Markup syntax typically uses delimited tags, attributes, and hierarchical nesting. SGML-based systems influenced element-oriented grammars used in XML and HTML5; Document Type Definitions from SGML and schema languages like XML Schema and RELAX NG define valid models. Attribute/value pairs and namespaces enable integration across vocabularies, as seen in RDF/XML and SOAP envelopes used by Microsoft Exchange and Apache Axis. Concepts such as well-formedness, validity, and parsing modes are implemented in parsers from projects like libxml2 and Expat and in browser engines such as WebKit and Blink.

Applications and Uses

Markup underpins web publishing, electronic documents, scientific publishing, and enterprise data interchange. Content management systems like WordPress and Drupal transform markup into rendered pages; e-commerce platforms such as Magento and Shopify use structured data to expose product metadata. Scientific workflows use TEI and JATS for scholarly content, while legal and governmental bodies adopt Akoma Ntoso and XHTML derivatives. In publishing, toolchains incorporate LaTeX for typesetting, DocBook for technical manuals, and DITA for modular documentation used by corporations like IBM and Adobe Systems. Mapping, GIS, and visualization rely on KML and GeoJSON integrations in Esri and QGIS stacks.

Tools and Processing

Parsing, transformation, validation, and rendering are executed by toolchains that include processors and libraries. XML parsers such as SAX and DOM implementations are found in Oracle Java runtimes and Microsoft .NET frameworks; transformation languages like XSLT and query languages like XPath and XQuery enable data manipulation in BaseX and eXist-db. Build systems and converters include Pandoc, wkhtmltopdf, and PrinceXML; editors and IDEs range from Visual Studio Code and Eclipse to Emacs and Vim. Continuous integration platforms like Jenkins and Travis CI often validate markup as part of documentation pipelines.

Standards and Interoperability

Interoperability depends on standards bodies and specifications: the W3C maintains HTML5 and SVG profiles, OASIS governs OpenDocument and DocBook, while ISO endorses SGML and related standards. Semantic web efforts by W3C include RDF and OWL connecting to linked data initiatives led by institutions like DBpedia and projects such as Wikidata. Conformance testing suites from browser vendors and organizations such as WHATWG and IETF influence implementation behavior. Industry consortia including IEEE and governmental interoperability frameworks ensure exchange formats align across platforms like SAP and Salesforce.

Category:Computer file formats