LLMpediaThe first transparent, open encyclopedia generated by LLMs

XML (eXtensible Markup Language)

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: REST Hop 4
Expansion Funnel Raw 58 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted58
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
XML (eXtensible Markup Language)
NameXML
DeveloperWorld Wide Web Consortium and International Organization for Standardization
Initial release1998
GenreMarkup language

XML (eXtensible Markup Language) is a markup language and set of rules for encoding documents in a format that is both human-readable and machine-readable. It was developed to enable structured data interchange across disparate systems, fostering interoperability among software produced by organizations such as the World Wide Web Consortium, the International Organization for Standardization, and vendors like Microsoft, IBM, and Sun Microsystems. XML influenced and coexisted with technologies from Tim Berners-Lee's work at the CERN and standards evolved alongside initiatives like SGML and HTML.

History

XML emerged from efforts to simplify Standard Generalized Markup Language implementations and to provide a web-friendly alternative to complex SGML tooling used by institutions such as ISO committees and companies like IBM. Key contributors included personnel from World Wide Web Consortium, practitioners at Netscape Communications, and engineers associated with Sun Microsystems and Microsoft; the first Recommendation was published by the World Wide Web Consortium in 1998. XML's development paralleled other standards work such as HTML5 and informed specifications like XSLT and XPath, while dialogues among organizations including IETF, W3C, and OASIS shaped its adoption across enterprise frameworks like SOAP-based web services and UDDI registries.

Design and syntax

XML's design centers on a minimal set of constructs: elements, attributes, entities, processing instructions, and comments. Documents use start-tags and end-tags with nested hierarchical structure reminiscent of Standard Generalized Markup Language but simplified for use by vendors like Microsoft and Oracle Corporation. Namespaces, influenced by proposals from groups such as W3C XML Namespaces Working Group, enable mixing vocabularies from projects like MathML, SVG, and XHTML without name collisions. Well-formedness and validity are enforced by DTDs and schema languages developed by organizations including W3C and OASIS; validators and parsers from vendors such as IBM, Sun Microsystems, and Google check conformance to these rules.

Serialization and data types

XML serialization encodes data as Unicode text using character encoding standards promoted by bodies like Unicode Consortium and IETF; common encodings include UTF-8 and UTF-16 specified in internet drafts and RFC documents. To express typed data, schema languages such as XML Schema (W3C), Relax NG from the OASIS community, and DTDs provide facilities for data typing and structural constraints; implementations from Apache Software Foundation projects and corporate libraries serialize and deserialize between XML and native types used in platforms like Java (programming language), .NET Framework, and Python (programming language). Binary XML initiatives, inspired by needs in projects from W3C working groups and companies like Microsoft and Oracle Corporation, explored efficiencies for constrained environments alongside alternatives such as JSON favored by organizations like Mozilla and Google.

XML sits within a family of interoperable standards: XSLT and XPath transform and navigate XML trees; XQuery provides query capabilities standardized by W3C and used in projects such as MarkLogic and BaseX; SOAP used XML for web services in ecosystems including Microsoft and IBM; RSS and Atom (standard) syndication formats rely on XML conventions pioneered by communities like Harvard University's groups and companies such as Netscape Communications. Other related specifications include MathML for mathematics, SMIL for multimedia, and SVG for vector graphics, each maintained or influenced by bodies like W3C and implemented in browsers produced by Mozilla Foundation and Apple Inc..

Security and common vulnerabilities

XML processing introduced security considerations addressed by standards bodies and vendors: entity expansion and external entity resolution vulnerabilities (XXE) were identified in advisories from organizations such as CERT Coordination Center and fixed by runtime options in libraries from Apache Software Foundation, Oracle Corporation, and Microsoft. Schema-based attacks, XML Signature and XML Encryption misuse, and denial-of-service vectors informed guidance from agencies like National Institute of Standards and Technology and security vendors such as Symantec and McAfee. Best practices promoted by OWASP and standards groups include disabling DTDs where unnecessary, applying strict parsing policies in libraries used by Java (programming language), .NET Framework, and Python (programming language), and using vetted cryptographic implementations from projects like OpenSSL and platform vendors.

Implementations and tooling

A wide ecosystem of parsers, validators, editors, and processors supports XML: parser libraries such as libxml2 from the GNOME project, Xerces from the Apache Software Foundation, and MSXML from Microsoft are used in applications by companies like IBM and Oracle Corporation. Transformation and query engines such as Saxon from Michael Kay's work, Xalan from the Apache Software Foundation, and XQuery processors in products by MarkLogic and BaseX enable enterprise workflows. Development environments and tools from vendors including JetBrains, Eclipse Foundation, Microsoft Visual Studio, and services hosted by Amazon Web Services and Google provide XML editing, schema design, and integration features. Standards governance and ongoing maintenance occur through groups like the World Wide Web Consortium and OASIS, while community projects and vendors continue to produce interoperable implementations used across industries.

Category:Markup languages