XPath — LLMpedia

Contents

Overview
Syntax and expressions
Data model
Functions and operators
Versions and standards
Usage and applications

XPath is a query language designed for selecting nodes from an XML document, developed by the World Wide Web Consortium. It is a fundamental component of several core web technologies, including XSLT, XQuery, and XML Schema, and is also used within other languages and frameworks for navigating document structures. The language provides a path notation for addressing parts of an XML tree, enabling the retrieval of specific elements, attributes, and text.

Overview

XPath models an XML document as a tree of nodes, which forms the basis for its navigation and selection capabilities. It was created to serve as a common syntax and semantics for functionality shared between XSLT and XPointer, a specification for addressing fragments of XML documents. The language's design allows it to be used both as a standalone tool for querying and as an embedded expression language within host environments like XSLT stylesheets or the Document Object Model APIs. Its primary role in the World Wide Web Consortium's suite of standards is to provide a precise method for locating data within the hierarchical structure of XML.

Syntax and expressions

The syntax of XPath is based on a compact, non-XML notation that resembles paths in a computer file system, using forward slashes to separate location steps. A core concept is the location path, which consists of a sequence of steps, each composed of an axis, a node test, and optional predicates; for example, the expression `//book[@year > 2000]/title` uses the descendant-or-self axis, a node test for `` elements, a predicate filtering by an attribute, and a child step. Expressions can be absolute, starting from the document root with a leading slash, or relative, starting from a context node. Other expression types include boolean expressions using operators like `and` and `or`, and numeric expressions for calculations.

Data model

XPath operates on a formal data model that represents an XML document as a tree composed of seven node types: element, attribute, text, namespace, processing instruction, comment, and the root node. This model is a logical abstraction, not necessarily a direct representation of the physical document structure, and is shared with XSLT and XQuery. Each node has a unique identity and a defined string value; for instance, the string value of an element node is the concatenation of all its descendant text nodes. The model also defines the concept of a node's expanded name, which includes its local name and namespace URI, crucial for processing documents that use XML Namespaces.

Functions and operators

The language includes a core library of functions and operators for manipulating the data selected by path expressions. These are organized into several categories: node-set functions like `count()` and `position()`; string functions such as `concat()`, `substring()`, and `contains()`; boolean functions including `true()` and `false()`; and numeric functions and operators like `sum()` and the standard arithmetic operators. A significant addition in XPath 2.0 was a rich set of functions for sequences, dates, and times, which expanded its utility beyond basic XML navigation. These functions are defined in the associated function library specification maintained by the World Wide Web Consortium.

Versions and standards

The first official recommendation, XPath 1.0, was published by the World Wide Web Consortium in 1999 and became widely implemented in parsers and processors like those in Java and the .NET Framework. XPath 2.0, released in 2007, represented a major expansion, introducing a much richer type system based on XML Schema, sequences as a fundamental data type, and a greatly enlarged function library; it is formally a subset of XQuery. XPath 3.0 and XPath 3.1 added further features, including support for higher-order functions, maps, arrays, and improved JSON processing, aligning with developments in XQuery. Each version is maintained as a separate specification by the World Wide Web Consortium.

Usage and applications

Beyond its original role in XSLT for template matching and value selection, XPath is extensively used in XQuery as its foundational navigation language and in XML Schema for defining identity constraints. It is also embedded within programming languages and libraries; for example, the Document Object Model Level 3 specification includes an interface for evaluating XPath expressions over a DOM tree, and it is supported in Python via libraries like `lxml`. In web testing and automation, tools such as Selenium use XPath to locate elements within HTML documents, treating them as XML. Its expressions are also crucial for defining data extraction rules in web scraping frameworks and for configuring message routing in enterprise service buses like MuleSoft.

Category:XML