Linked Data — LLMpedia

Linked Data
Name	Linked Data
Key people	Tim Berners-Lee
Related concepts	Semantic Web, Resource Description Framework, Web Ontology Language

Contents

Definition and principles
Technical foundations
Applications and use cases
Standards and specifications
Challenges and criticism
History and development

Linked Data. It is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. The concept builds upon standard World Wide Web technologies but applies them to data rather than just documents. Its core aim is to extend the web into a global data space, enabling machines to understand and meaningfully process information. The principles were famously outlined by Tim Berners-Lee in his 2006 design note on the Semantic Web.

Definition and principles

The term is defined by a set of best practices for publishing and connecting structured data on the World Wide Web. These practices are encapsulated in four rules, often called the Linked Data principles, which were articulated by Tim Berners-Lee. The first rule mandates the use of Uniform Resource Identifiers to name things, providing a global identification system. Secondly, these Hypertext Transfer Protocol URIs should be dereferenceable, meaning they provide useful information when looked up. The third principle states that when a URI is dereferenced, it should provide data using standard formats like the Resource Description Framework. Finally, the data should include links to other URIs, allowing the discovery of more related information across the European Organization for Nuclear Research or other distributed sources.

Technical foundations

The technical implementation relies heavily on a suite of standards developed by the World Wide Web Consortium. The foundational layer is the Resource Description Framework, a graph-based data model for expressing information about resources. Data is expressed as triples, consisting of a subject, predicate, and object, which can be serialized in formats like RDF/XML or Turtle (syntax). To define the vocabularies and relationships used in these triples, specifications like the Web Ontology Language and Simple Knowledge Organization System are employed. For querying interconnected datasets, the SPARQL protocol and query language serves as the standard, analogous to Structured Query Language for traditional databases, enabling complex queries across endpoints like those provided by the DBpedia project.

Applications and use cases

A prominent application is in government and cultural heritage, where projects like data.gov in the United States and the European Union's European Data Portal publish official statistics as interconnected datasets. In academia, major research institutions use it to create vast knowledge graphs, such as the CrossRef service for scholarly publications and the Global Biodiversity Information Facility for species data. The life sciences community leverages it through resources like the UniProt database and the DrugBank repository, linking genetic information to pharmaceutical compounds. Commercial entities, including Google and Microsoft, utilize structured data through initiatives like Schema.org to enhance search engine results and knowledge panels.

Standards and specifications

The ecosystem is governed by a robust set of open standards maintained primarily by the World Wide Web Consortium. Core specifications include the Resource Description Framework and its various serializations, alongside the Web Ontology Language for creating complex ontologies. For data interchange, formats like JSON-LD, developed by the Internet Engineering Task Force, have gained popularity for embedding linked data in JavaScript Object Notation. The SPARQL protocol defines how clients can query remote data stores. Community-driven vocabularies, such as those published by the Dublin Core Metadata Initiative and the Friend of a Friend project, provide essential schemas for describing common entities like people and documents.

Challenges and criticism

Significant challenges include issues of data quality, consistency, and the difficulty of establishing reliable links between disparate sources from organizations like the International Monetary Fund or NASA. The performance of querying distributed data via SPARQL endpoints can be a bottleneck, leading to research into federated query processing. Critics, including some from the Web Science Trust, argue that the vision has been slow to achieve widespread adoption beyond specific domains like bioinformatics. Furthermore, concerns about the complexity of the Resource Description Framework stack and the overhead of maintaining interlinked datasets have been raised by practitioners in fields like digital humanities and library science.

History and development

The conceptual origins are deeply intertwined with the development of the Semantic Web, a vision for a machine-readable web championed by Tim Berners-Lee since the late 1990s. A pivotal moment was Berners-Lee's 2006 publication of the design note outlining the core principles. Early flagship projects that demonstrated its potential included the Linking Open Data community project and the creation of DBpedia, which extracted structured data from Wikipedia. The movement gained institutional support through initiatives like the European Union's ISA2 programme and the United Kingdom's Open Data Institute, co-founded by Berners-Lee and Nigel Shadbolt. Its influence is now seen in the architecture of major knowledge graphs used by Facebook, IBM, and Amazon Web Services.

Category:World Wide Web Category:Semantic Web Category:Data management