XYZ file format — LLMpedia

XYZ file format
Name	XYZ file format
Extension	.xyz
Type	Data file
Owner	Various organizations
Released	1990s–2000s (varies)
Latest release	Multiple revisions (vendor-dependent)
Genre	Binary and plain-text data interchange

Contents

Overview
File structure
Usage and applications
Software and tool support
History and development
Security and compatibility considerations

XYZ file format is a general-purpose data interchange format used in multiple domains for representing point-clouds, coordinate sets, metadata blocks, and structured records. It appears in distinct implementations across scientific computing, graphics, chemistry, and geographic systems, where competing definitions coexist. Implementations often differ in encoding, metadata conventions, and intended consumer software.

Overview

The format is implemented by vendors, standards bodies, and open-source projects, each producing variant specifications and profiles. Notable institutions and projects associated with different profiles include National Aeronautics and Space Administration, European Space Agency, Lawrence Berkeley National Laboratory, Google, and Open Geospatial Consortium. Implementations frequently target workflows tied to Autodesk, Blender, Esri, Agisoft, and Trimble. In academic contexts the format is referenced in publications from Stanford University, Massachusetts Institute of Technology, ETH Zurich, University of Cambridge, and California Institute of Technology that address point-cloud processing, molecular modeling, or surface reconstruction.

Multiple ecosystems adopt the format for input/output pipelines, including those centered on Point Cloud Library, PDAL, VTK, ParaView, and MeshLab. Commercial software such as Autodesk ReCap, Bentley Systems, Leica Geosystems, FARO Technologies, and Hexagon AB may use their own .xyz-variant for interoperability. Open-source toolchains built around GNU Project utilities and Python libraries like SciPy, NumPy, and Pandas also read and write variants.

File structure

Structure varies: some profiles specify plain-text ASCII rows with coordinate tuples, others define compact binary records or tagged sections with metadata headers. Common components in many variants include a header block, coordinate arrays, attribute columns, and optional footer or checksum. Profiles used by Crystallographic Open Database or computational chemistry tools align fields with atom types and occupancy; those linked to Geospatial Information Systems align with geodetic datums and projection identifiers related to EPSG datasets.

Headers in text variants often include provenance fields referencing organizations such as National Institute of Standards and Technology, U.S. Geological Survey, Ordnance Survey, and Natural Resources Canada, along with timestamping conventions derived from ISO 8601-style guidance. Binary variants may include segment tables and offsets analogous to container formats produced by W3C working groups or IETF registries. Metadata blocks sometimes embed identifiers corresponding to catalogues like Digital Object Identifier or dataset registries used by European Data Portal and Data.gov.

Attributes beyond coordinates commonly include color channels linked to standards from International Commission on Illumination, intensity measures tied to sensors from Velodyne, RIEGL, or Leica, and classification codes influenced by Federal Geographic Data Committee categories. Chemistry-oriented XYZ variants map to atom labels standardized by IUPAC and bond descriptions referenced by resources like Protein Data Bank.

Usage and applications

Variants serve diverse applications: 3D scanning and survey workflows by Topcon and Trimble; photogrammetry pipelines employed by researchers at University of Oxford and firms like Agisoft; molecular modeling used at Brookhaven National Laboratory and European Molecular Biology Laboratory; and visualization in platforms such as Unity Technologies and Unreal Engine. In environmental science, datasets produced by NOAA and NASA are sometimes exported to compatible variants for analysis in MATLAB or R Project for Statistical Computing.

Analytic workflows using TensorFlow or PyTorch may ingest point sets for machine learning tasks developed at Carnegie Mellon University or Google Research. Cultural heritage projects run by British Museum and Bibliothèque nationale de France utilize scanned meshes exported to the format for archiving and dissemination. Geotechnical and civil engineering projects executed by Bechtel or AECOM rely on variants for as-built verification and clash detection in conjunction with BuildingSMART standards.

Software and tool support

Broad tool support exists but depends on the specific profile. Open-source readers and writers are available in GitHub repositories maintained by organizations like Open Source Geospatial Foundation and contributors associated with Apache Software Foundation projects. Libraries in C++, Python, Java, and C# facilitate integration into pipelines used by developers at Microsoft Research, Intel, NVIDIA, and startups incubated at Y Combinator.

Visualization and editing tools offering import/export include MeshLab, CloudCompare, ParaView, Blender Foundation builds, and proprietary suites such as Autodesk Maya and Bentley MicroStation. Conversion utilities provided by PDAL and command-line tools in Linux distributions enable batch processing on clusters at institutions like CERN and Lawrence Livermore National Laboratory.

History and development

The family of formats evolved organically rather than from a single standardization process. Early use-cases in computational chemistry trace to research groups at Brookhaven National Laboratory and publications from Journal of Chemical Physics, while point-cloud uses expanded with lidar adoption by USGS and airborne mapping firms in the 1990s–2000s. Commercial vendors formalized internal variants during productization at Autodesk, Leica Geosystems, and FARO. Community-driven efforts toward harmonization appear in working groups within Open Geospatial Consortium and collaborative projects coordinated by Research Data Alliance.

Standardization attempts have produced multiple specifications and profiles, some endorsed by national mapping agencies like Ordnance Survey and Geoscience Australia, and academic consortia convened by National Science Foundation. The fragmented lineage explains coexistence of incompatible variants and spurred development of converters and canonical schemas.

Security and compatibility considerations

Compatibility depends on exact profile alignment; mismatched coordinate reference systems or attribute schemas can produce semantic errors when consumed by systems from Esri or QGIS projects. Embedding executable payloads or malformed binary segments can create attack surfaces exploited via parsers used by libraries maintained by Apache Software Foundation or Boost C++ Libraries. Best practices advocated by agencies such as NIST include validation against schema profiles, checksum verification, and sandboxed parsing in environments influenced by SELinux and AppArmor policies.

Interoperability is improved by metadata adherence to registries like Dublin Core and geodetic references to EPSG codes. Backward compatibility concerns are managed through explicit versioning in headers and migration tools developed by Open Geospatial Consortium working groups and community contributors hosted on GitHub.

Category:Computer file formats