XML — LLMpedia

XML
Name	XML
Paradigm	Markup language
Designed by	World Wide Web Consortium
First appeared	1998
File extensions	.xml
Mime types	application/xml, text/xml

Contents

History
Design and Syntax
Data Types and Schemas
Processing and APIs
Applications and Use Cases
Security and Limitations

XML is a markup language that defines a set of rules for encoding documents in a format both human-readable and machine-readable. It was developed to facilitate data interchange among disparate systems, enable document storage and transformation, and serve as a foundation for many web technologies and standards.

History

XML was developed by the World Wide Web Consortium in the late 1990s as a simplified subset of a broader document model originally associated with Standard Generalized Markup Language and by communities involved with HTML, SGML, and software vendors such as Netscape and Microsoft. Influential working group participants included contributors from Sun Microsystems, IBM, Oracle, and Adobe who collaborated with editors of IETF specifications and implementers familiar with browsers like Internet Explorer and Mozilla. Early adoption was driven by standards efforts at organizations such as the Internet Engineering Task Force and by enterprise initiatives led by companies including Hewlett-Packard, SAP, and Boeing. XML's emergence influenced and was influenced by contemporaneous technologies such as SOAP, W3C specifications like XHTML, and formats adopted by governments and institutions for archival exchange.

Design and Syntax

The design emphasizes simplicity, generality, and usability across the Internet; its syntax is derived from SGML and shares lineage with HTML and TeX traditions. Documents are composed of elements with start-tags and end-tags, attributes, and a hierarchical tree structure that parsers from libraries such as Xerces, libxml2, and MSXML represent as document object models similar to structures used by DOM implementations in browsers like Firefox and Chrome. Character encoding conventions reflect standards from ISO and Unicode, and namespace mechanisms were introduced to avoid name collisions in composite documents used by standards bodies like OASIS and IETF. Well-formedness and validity constraints define how processors from vendors such as IBM, Oracle, and Microsoft accept or reject content, and canonicalization rules devised in protocols used by organizations such as W3C and IETF support digital signatures and interoperability with standards like XML Signature.

Data Types and Schemas

XML supports typed content via schema languages standardized and promoted by institutions including W3C and OASIS; prominent schema grammars include XML Schema Definition (XSD), RELAX NG, and DTDs which trace heritage to SGML. XSD, developed within W3C working groups with contributors from Microsoft and IBM, defines built-in primitive and derived data types, complex type models, and namespace-aware validation used in enterprise products from SAP, Oracle, and IBM. RELAX NG, championed by Sun Microsystems and others, provides compact syntax and patterns favored in communities producing specifications like DocBook and projects at organizations such as the Apache Foundation. Validation tools and schema-aware parsers from vendors and open-source projects facilitate mapping to programming models used in .NET, Java runtimes such as OpenJDK, and platforms maintained by Red Hat and Canonical.

Processing and APIs

Processing models for parsing, transforming, and querying XML were standardized by W3C and implemented across ecosystems maintained by companies such as Microsoft, IBM, Oracle, and foundations like Apache. Streaming APIs such as SAX were influenced by early XML parsers and used in server-side applications at Amazon and Google; tree-based models like DOM were integrated into browsers by projects such as Mozilla and Chromium. Transformations rely on standards like XSLT and XPath developed in W3C working groups and applied in enterprise middleware from companies such as TIBCO and MuleSoft. Query and update languages including XQuery and XUpdate gained traction in database products from Oracle, MarkLogic, and eXist-db; APIs for languages such as Java, C#, and Python provide bindings through libraries maintained by Apache, GNOME, and Microsoft.

Applications and Use Cases

XML underpins many formats, protocols, and standards adopted by international bodies, corporations, and projects: Office document standards adopted by Microsoft and later standardized by ISO, publishing formats like DocBook used by O'Reilly and the Linux Documentation Project, and interchange formats in finance influenced by ISO standards and implemented by banks and institutions such as SWIFT. In web services, SOAP messages and WSDL definitions specified by W3C and OASIS drove integrations used by companies such as IBM and Microsoft; industry-specific schemas appear in healthcare standards from HL7, geospatial formats from OGC, and metadata standards maintained by libraries and archives such as the Library of Congress and IETF registries. Tooling and pipelines at enterprises like Amazon, Facebook, and Netflix historically used XML for configuration, serialization, and interoperability alongside competing formats championed by Google and Facebook.

Security and Limitations

XML processing introduces attack surfaces discussed in advisories from CERT, IETF, and national cybersecurity centers; notable risks include entity expansion attacks, XML External Entity vulnerabilities identified by security researchers and tracked by organizations such as OWASP, and schema-related issues exploited in middleware from vulnerable vendors. Performance and verbosity concerns, cited by platform teams at Google and Facebook, led to alternative binary and text formats promoted by IETF and industry projects such as Protocol Buffers and JSON initiatives from Douglas Crockford and ECMA. Tooling mitigations and best practices recommended by security teams at Microsoft, Red Hat, and Apache include parser configuration, input validation, use of secure APIs, and schema constraints; nevertheless, legacy systems in enterprises and institutions such as government archives and financial services continue to contend with compatibility and scalability limitations.

Category:Markup languages