Linked Open Data — LLMpedia

Linked Open Data
Name	Linked Open Data
Launched	2007
Status	Active

Contents

Definition and Principles
Data Formats and Technologies
Publishing and Linking Practices
Legal and Licensing Considerations
Applications and Use Cases
Challenges and Criticisms

Linked Open Data

Linked Open Data is an approach to publishing structured data on the Web that emphasizes machine-readable formats, use of Uniform Resource Identifier, and explicit, interlinked metadata to enable discovery and reuse across domains. It builds on standards from the World Wide Web Consortium, practices promoted by the Semantic Web community, and initiatives exemplified by projects from the European Union, United States National Institutes of Health, and cultural institutions such as the British Library and the Library of Congress. Advocates include researchers from Massachusetts Institute of Technology, University of Oxford, and companies like Google and Facebook that leverage linked data principles for integration and search.

Definition and Principles

The core principles derive from the work of figures in the Semantic Web movement and are informed by standards from the World Wide Web Consortium and designs used in projects at the World Bank, United Nations, and National Aeronautics and Space Administration. These principles prescribe use of Uniform Resource Identifiers to name entities, dereferenceable HTTP identifiers as recommended by the IETF, and structuring relationships using vocabularies such as Resource Description Framework, RDF Schema, and Web Ontology Language. Interlinking between datasets encourages alignment with identifiers from authorities like the Library of Congress, the VIAF consortium, and registries used by the European Data Portal. Transparency and access echo policy frameworks promoted by the Open Knowledge Foundation, Open Government Partnership, and national open data programs in United Kingdom, Germany, and Canada.

Data Formats and Technologies

Implementation relies on serialization formats and protocols standardized by the World Wide Web Consortium and related bodies: RDF/XML, Turtle (syntax), JSON-LD, and N-Triples for RDF serializations; SPARQL as the query language standardized by the W3C; and HTTP content negotiation for serving representations compatible with clients from projects at the European Bioinformatics Institute and the National Center for Biotechnology Information. Ontologies developed in contexts such as the Gene Ontology project, the Dublin Core Metadata Initiative, and the Friend of a Friend vocabulary provide reusable schemas, while tools like Apache Jena, OpenLink Virtuoso, and GraphDB support storage and reasoning. Provenance is modeled using standards such as PROV (W3C), and vocabularies from institutions including the Museum of Modern Art and the Smithsonian Institution inform cultural heritage datasets.

Publishing and Linking Practices

Publishers adopt best practices from pilots by the BBC, Wikimedia Foundation, and national libraries like the Bibliothèque nationale de France to expose datasets with persistent identifiers and inter-dataset links to resources such as DBpedia, Wikidata, and domain hubs like the European Data Portal. Typical workflows include minting URIs, documenting vocabulary terms via registries like the Linked Open Vocabularies initiative, and creating owl:sameAs or SKOS mappings to align concepts with authority files at the Getty Research Institute and the International Standard Name Identifier. Harvesting and federation use protocols and tools developed in projects at Stanford University, University of Leipzig, and Zentral‐ und Hochschulbibliothek Luzern to provide SPARQL endpoints, dataset dumps, and APIs for consumers including Semantic MediaWiki deployments and commercial platforms by Microsoft and Oracle.

Legal and Licensing Considerations

Legal frameworks intersect with practices from institutions such as the European Commission, the United States Copyright Office, and the World Intellectual Property Organization. Licensing choices often reuse standard instruments crafted by the Creative Commons and legal models promoted by the Open Data Institute and the Open Knowledge Foundation to clarify reuse rights. National laws in jurisdictions like France, Australia, and Brazil influence public sector data release policies, while agreements with rightsholders such as the British Broadcasting Corporation and publishers like Elsevier affect access to bibliographic and scientific datasets. Data protection considerations involve compliance with regulatory regimes exemplified by the General Data Protection Regulation and guidance from agencies like the European Data Protection Board and the U.S. Department of Health and Human Services.

Applications and Use Cases

Adoption spans domains represented by major institutions and firms: cultural heritage linking across the British Museum, the Metropolitan Museum of Art, and the Getty Research Institute; scientific integration among the European Bioinformatics Institute, National Institutes of Health, and consortia like the Human Genome Project; government transparency portals in United Kingdom, United States, and New Zealand; and enterprise knowledge graphs developed by Google, Facebook, and Microsoft. Use cases include enhanced search and discovery for projects such as DBpedia and Wikidata, citation and bibliographic reconciliation involving the Library of Congress and CrossRef, and public health data aggregation practiced by the World Health Organization and national public health agencies.

Challenges and Criticisms

Critiques arise from researchers at institutions including Massachusetts Institute of Technology, University of Amsterdam, and Carnegie Mellon University who note issues with data quality, provenance, and scalability in deployments like early linked data cloud experiments and university catalogs. Interoperability is hindered by divergent ontologies from initiatives such as the Gene Ontology and bespoke schemas used by archives including the National Archives (United Kingdom), while commercial interests from firms like Elsevier and Thomson Reuters complicate open aggregation. Privacy and legal tension involve regulators like the European Data Protection Board and enforcement actions in jurisdictions represented by the United States Department of Justice. Performance and governance problems have been highlighted in evaluations by the European Commission and scholarly assessments published through presses like Springer and MIT Press.

Category:Data integration