LLMpediaThe first transparent, open encyclopedia generated by LLMs

XML (markup language)

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 1 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted1
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
XML (markup language)
NameXML
CaptionExtensible Markup Language logo
DeveloperWorld Wide Web Consortium
Released1998
Latest release1.0 (Fifth Edition)
OsCross-platform
GenreMarkup language
LicenseW3C Recommendation

XML (markup language) is a text-based format for representing structured data using custom tags and a hierarchical tree model. Created to enable platform- and application-independent data interchange, XML became a foundation for many web, publishing, and enterprise standards. It influenced, and was influenced by, a range of technologies developed by prominent organizations and individuals in the late 20th and early 21st centuries.

History

XML emerged from efforts led by the World Wide Web Consortium and contributors including Tim Berners-Lee, Jon Bosak, and James Clark to simplify and generalize SGML usage pioneered in document processing for projects at companies such as IBM and Microsoft. The W3C Recommendation formalized XML in 1998 after input from the Internet Engineering Task Force, OASIS, and industry groups like Sun Microsystems and Oracle. Subsequent editions and errata were produced with involvement from standards bodies and implementers including the Unicode Consortium and the IETF. XML’s uptake paralleled the growth of the World Wide Web and influenced initiatives associated with organizations such as the Apache Software Foundation, Microsoft .NET, and the Object Management Group.

Design and Concepts

XML’s design emphasizes simplicity, generality, and usability across the Internet and enterprise systems, reflecting principles advocated by W3C leadership and contributors such as Tim Bray and Jean Paoli. It adopts a tree-oriented data model analogous to concepts used in database theory at institutions like Bell Labs and research from Xerox PARC. Namespaces were introduced to address name collision problems in large projects involving standards like SOAP and WSDL, with coordination among working groups such as W3C XML Schema WG and OASIS. XML’s emphasis on human-readable text aligns with practices at organizations such as IEEE for documentation and IETF for protocol specifications.

Syntax and Components

XML documents consist of elements, attributes, processing instructions, comments, CDATA sections, and entity declarations; these constructs were distilled by editors working with participants from IBM, Microsoft, and Sun Microsystems. Elements form a nested tree where each start-tag and end-tag pair defines a node, similar to data models used in DOM implementations by Netscape and Mozilla. Validity is checked against grammars like DTDs, W3C XML Schema, and RELAX NG, technologies developed with contributions from OASIS, ISO, and independent authors such as James Clark. Character encoding follows the Unicode standard maintained by the Unicode Consortium; parsers implement interfaces such as SAX and DOM specified by W3C and used in platforms like Apache Xerces and Microsoft MSXML.

XML interacts with a broad ecosystem including namespaces, XPath, XSLT, XQuery, SOAP, WSDL, RSS, Atom, SVG, MathML, and XHTML—specifications driven by W3C working groups and vendor consortia like OASIS, IETF, and ECMA International. Transformation and querying languages such as XSLT and XQuery were developed through collaboration among researchers from universities such as Stanford and INRIA and vendors including Oracle and IBM. Serialization formats and complementary standards like JSON emerged as alternative data-interchange approaches advocated in communities around companies such as Google and Facebook. Security and signature standards like XML Signature and XML Encryption were standardized with input from organizations including the IETF and W3C and are implemented in toolkits from Microsoft, Apache, and RSA Laboratories.

Applications and Use Cases

XML has been adopted across publishing, finance, government, telecommunications, and software configuration, with schemas and vocabularies created by bodies such as the OASIS UBL Committee, W3C, and ISO committees. Document formats including Office Open XML and OpenDocument were developed by Microsoft and OASIS respectively, influencing suites like LibreOffice and Microsoft Office. Web services architectures based on SOAP and WSDL were promoted by vendors like IBM and Microsoft and used in enterprise systems from SAP and Oracle. Scientific data exchange and standards for geospatial data involve organizations such as the Open Geospatial Consortium; news syndication uses RSS and Atom formats maintained by community groups and media companies. Tooling and libraries supporting XML parsing, validation, and transformation are provided by projects such as Apache Xerces, Saxon, Libxml2, and Microsoft .NET.

Criticism and Limitations

Critics including proponents at Google and other Internet companies have argued that XML is verbose and heavier than alternatives like JSON and Protocol Buffers developed by Google and other engineering teams. Performance and parsing overhead have been cited in discussions in venues such as the IETF and Apache developer lists, prompting adoption of binary XML proposals and efficient parsers by vendors including IBM and Sun Microsystems. Complexity from multiple schema languages, transformation layers, and namespace rules led to fragmentation debated within W3C working groups and standards committees. For constrained environments, organizations such as IETF and W3C encouraged lightweight alternatives and profiles, while large enterprises continued to use XML where its extensibility and document-orientation were advantageous.

Category:Markup languages