LLMpediaThe first transparent, open encyclopedia generated by LLMs

XML (format)

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Apache Avro Hop 4
Expansion Funnel Raw 46 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted46
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
XML (format)
NameXML
Extension.xml
Mimeapplication/xml, text/xml
DeveloperWorld Wide Web Consortium
Released1998
GenreMarkup language

XML (format) is a markup language for encoding documents and data in a format that is both human-readable and machine-readable. It was developed as a simplified subset of SGML to facilitate data interchange across systems, platforms, and applications. XML influenced web standards and data formats and remains in use alongside alternatives such as JSON and RDF.

History

XML emerged from efforts to simplify Standard Generalized Markup Language implementations and to enable structured data exchange between systems developed by organizations such as the World Wide Web Consortium, the Internet Engineering Task Force, and contributors from companies like Sun Microsystems, Microsoft, and IBM. The XML 1.0 Recommendation was published in 1998 under the leadership of figures connected to Tim Bray, Jon Bosak, and others associated with the W3C XML Core Working Group. XML's development intersected with contemporaneous standards work on HTML 4.0, SGML, and later influenced initiatives like XSLT, WSDL, and SOAP, which played roles in early Web Services architectures.

Design and syntax

XML's syntax is derived from SGML and emphasizes a strict, well-formed tree structure with elements, attributes, and namespaces. Documents start with an optional XML declaration and use angle-bracketed tags to delimit elements, enabling hierarchical representation suitable for transformations by XSLT or queries via XPath. XML Namespaces were specified to resolve naming collisions across vocabularies and were developed in coordination with the W3C and contributors from standards bodies and companies including Sun Microsystems and Microsoft. Character encoding support follows Unicode standards, and parsing requires conformance to production rules defined in the XML Recommendation, which informed later specifications like XML Schema and Relax NG.

Data types and schema

To provide typed data and validation, several schema languages arose: Document Type Definition (DTD) (rooted in SGML), W3C XML Schema (often called XSD), and alternative schema languages such as RELAX NG and Schematron. XSD introduced complex and simple types, namespaces, and datatype libraries influenced by ISO and Unicode practices; DTDs remain simple and widely used in legacy systems managed by organizations like Apache HTTP Server-based projects. Schema languages enabled interoperability in domains that include standards maintained by institutions such as OASIS, ISO/IEC, and regulatory frameworks adopted by governments and industry consortia.

Processing and APIs

XML processing models split between tree-based and event-based parsing. The Document Object Model (DOM) specification standardized an in-memory tree API under the auspices of the W3C and is commonly implemented in languages like Java (programming language), C#, and Python (programming language). SAX (Simple API for XML) provides an event-driven streaming API popularized in Java (programming language) ecosystems, while StAX introduced a pull-parser model used in enterprise software from vendors such as Oracle Corporation and IBM. Higher-level bindings and APIs, including JAXB and XMLBeans, were developed in response to service-oriented designs promoted by organizations like Sun Microsystems and standards bodies such as the W3C and IETF.

Applications and usage

XML has been used across a wide range of domains and standards: document formats like Office Open XML and OpenDocument, publishing standards managed by W3C and ISO, configuration files in systems from Apache Software Foundation projects to enterprise platforms by IBM and Oracle Corporation, data interchange formats in SOAP-based Web Services, and metadata standards such as Dublin Core. Scientific and government agencies, including projects associated with NASA, European Space Agency, and national archives, have used XML for archival metadata and interchange. Industry-specific standards—such as HL7 in healthcare, FIX in finance, and XBRL for financial reporting—rely on XML vocabularies and schemas standardized by bodies like OASIS and ISO.

Criticism and limitations

Critics cite verbosity, performance overhead, and complexity of related standards (XSLT, XSD, SOAP) compared with alternatives such as JSON and binary formats promoted by groups like the Kubernetes community. The multiplicity of schema languages and optional features led to interoperability challenges noted by implementers at companies including Microsoft and Sun Microsystems and in interoperability events coordinated by the W3C. Security concerns—such as XML external entity (XXE) attacks and XML signature complexities—prompted advisories from organizations like OWASP and influenced secure parsing guidelines used by vendors like IBM and Oracle Corporation.

Category:Markup languages