LLMpediaThe first transparent, open encyclopedia generated by LLMs

tar (software)

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 71 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted71
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
tar (software)
Nametar
DeveloperVarious (originally AT&T Bell Labs, later GNU Project)
Released1979
Operating systemUnix-like, Windows
PlatformCross-platform
GenreFile archiver
LicenseVarious (proprietary, BSD, GNU GPL)

tar (software)

tar is a computer program used to collect many files into a single archive file, commonly for backup and distribution. Initially developed on Bell Labs for Version 7 Unix and later standardized by groups such as the IEEE and the POSIX working group, tar remains a core utility on Unix, Linux, and other Unix-like systems and has ports to Microsoft Windows and other platforms. It is closely associated with utilities like gzip, bzip2, and xz for compression and with package formats used by distributions such as Debian and Red Hat Enterprise Linux.

Overview

tar combines multiple files and directory trees into a single archive, preserving metadata such as permissions, timestamps, and ownership for portability between systems like Solaris, FreeBSD, and NetBSD. As an essential tool in the GNU Project toolchain and in system administration tasks on Ubuntu, Fedora, and CentOS, tar archives are commonly exchanged as tarballs (files typically named with extensions like .tar, .tar.gz, .tgz, .tar.bz2). Integration with compression utilities developed by projects such as the GNU gzip and XZ Utils teams allows efficient storage and transfer for uses ranging from source code distribution for projects hosted on GitHub or GitLab to system backup workflows used by enterprises and open-source communities.

History

The original implementation of tar was created at Bell Labs for early Unix releases circa 1979 to write files sequentially to tape drives such as those produced by DEC and Sun Microsystems. As Unix variants proliferated across organizations including AT&T, Berkeley Software Distribution, and later vendors like IBM, multiple implementations and extensions emerged. Standards efforts by bodies such as the IEEE and the Portable Operating System Interface (POSIX) committee produced specifications to harmonize behavior, leading to the POSIX.1-1988 and later POSIX.1-2001 standards. The GNU Project produced a free-software implementation that integrated with the Free Software Foundation’s ecosystem, while commercial and BSD-derived systems maintained compatible versions. Over decades, new header formats and features were added to accommodate filesystem enhancements introduced by platforms including Linux and macOS.

Features and usage

tar supports operations such as create, extract, append, list, and compare, enabling workflows used by administrators of Red Hat, Ubuntu Server, and cloud providers like Amazon Web Services and Google Cloud Platform. Options allow preservation of metadata relevant to systems like AIX and HP-UX, including POSIX ACLs and sparse file handling used by database appliances and virtualization platforms such as VMware ESXi and KVM. Common command-line patterns combine tar with compressors from projects such as gzip, bzip2, and XZ Utils to produce compressed tarballs for distribution by projects like Debian and Fedora Project. tar also supports incremental backups leveraging snapshotting strategies used in rsync and integration with backup suites like Bacula and Amanda.

File format and standards

The tar archive format encodes metadata in headers with fixed-size records suitable for sequential media like magnetic tape arrays produced by companies such as EMC Corporation and Hewlett-Packard. The original format was extended by formats such as the USTAR header, which was designed during standardization by POSIX to address filename length and uid/gid limits, and later by the POSIX.1-2001 "pax" (Portable Archive Exchange) format which introduced key-value records to store extended attributes including long filenames and nonnumeric timestamps important for interoperability with filesystems like ZFS and XFS. The GNU tar variant introduced its own extensions to handle sparse files and extended attributes used on Linux and in macOS resource forks. Standards and specifications are maintained in contexts such as the IEEE Std 1003.1 family and documentation provided by projects like the GNU Project.

Implementations and platforms

Implementations exist in the GNU Project, the BSD family (including FreeBSD, OpenBSD, NetBSD), and commercial Unix vendors such as IBM AIX and Oracle Solaris. Ports are available for Microsoft Windows via projects such as Cygwin and native tools bundled in environments like Windows Subsystem for Linux and utilities from GnuWin32. Package management ecosystems—RPM Package Manager used by Red Hat, YUM and DNF, and dpkg used by Debian and Ubuntu—routinely handle tar archives in source packages and distribution tarballs. Cross-platform toolchains for build systems like Autotools, CMake, and GNU Make commonly produce tar archives for source code distribution to services like SourceForge and GitHub.

Security and limitations

tar archives can embed filenames and metadata that, when extracted, may overwrite critical filesystem paths on systems such as Linux or macOS if not properly handled; this has led to advisories from organizations like CERT Coordination Center and prompted safety features in implementations like GNU tar to reject dangerous paths. Tar format limitations include finite header field sizes that affect portability across systems with divergent uid/gid schemes seen in environments managed by LDAP or Active Directory, and sparse-file representations that require compatible readers; incompatibilities have impacted backup interoperability between vendors like EMC Corporation and open-source solutions. Compression or archive bombs produced by techniques documented by researchers at institutions such as MIT and University of Cambridge can cause denial-of-service conditions during extraction. Best practices promoted by security teams at Red Hat and Ubuntu recommend validating archives, using sandboxed extractors, and preferring signed distributions delivered via transport mechanisms associated with OpenPGP and package-signing infrastructures used by Debian Project and the GNU Project.

Category:Unix software