Generated by GPT-5-mini| tar | |
|---|---|
| Name | tar |
| Title | tar |
| Developer | Various (notably GNU Project, Bell Labs) |
| Released | 1979 |
| Latest release version | varies by implementation |
| Programming language | C (programming language) |
| Operating system | Unix, Linux, macOS, FreeBSD, Microsoft Windows |
| License | Various (including GNU General Public License) |
tar Tar is a computer program and file format for archiving multiple files into a single file and preserving filesystem metadata. It was developed in the late 1970s for the Unix operating system and later adopted across Linux, BSD derivatives, and other platforms. The tool is widely used in software distribution, backup, and system administration workflows, and exists in multiple implementations maintained by projects such as the GNU Project and contributors from Bell Labs.
The utility originated at Bell Labs for early Unix releases to consolidate files for tape-based backups and distribution. In the 1979 timeframe, tar addressed the needs of tape archive operations for devices and formats used in that era. Over ensuing decades, tar evolved alongside storage media shifts—from magnetic tapes to optical media and disk—while remaining integral to Unix-like ecosystems, influencing packaging systems such as those used by Debian, Red Hat, and Slackware. The GNU Project produced the widely deployed GNU tar implementation, and standards efforts culminated in the POSIX ustar and later POSIX.1-2001 pax extensions to improve interoperability between different systems and archive consumers.
Tar’s principal purpose is to collect many filesystem objects—regular files, directories, symbolic links, device nodes, and special files—into a single archive stream that preserves metadata such as ownership and permissions. It supports sequential writing suitable for block- and tape-oriented devices historically used in System V and early BSD administration. Tar archives can serve as input to compression filters to produce compressed distributable artifacts used by projects like Apache HTTP Server distributions, language ecosystems (for example, Python (programming language) source distributions), and operating system installers for FreeBSD or OpenBSD.
The archive format stores a header per entry containing name, mode, owner and group identifiers, size, modification time, and checksum. Early implementations used a 512-byte record size and filename length limits addressed by the ustar (Uniform Standard Tape ARchive) extension standardized by POSIX. The pax format, standardized in POSIX.1-2001, introduced extended headers to support long filenames, additional metadata, and UTF-8 encoding improvements used in modern distributions like those produced by GNU Project tooling. Variant formats are used by specific packaging systems, and many implementations provide compatibility modes to read legacy archives produced by older Unix utilities.
Typical command-line options allow creating, listing, extracting, and concatenating archives. Common flags in mainstream implementations include creating an archive with -c, extracting with -x, listing contents with -t, and specifying filenames and devices. GNU tar augmented the traditional options with long-form options and features such as --extract, --create, --file, and --verbose. Administrators combine tar invocations with compression filters like gzip, bzip2, or xz via pipelines or built-in compression switches; also used in batch operations orchestrated by automation tools such as Ansible and Puppet.
Tar itself is an archiver, not a compressor; integration with compressors yields compressed archives commonly referred to by compound extensions. Workflows frequently pair tar with gzip (producing .tar.gz), bzip2 (.tar.bz2), xz (.tar.xz), or zstd (.tar.zst) to reduce storage footprint and network transfer time. Platform packaging systems—such as RPM Package Manager and dpkg—use tar streams internally or as part of archive construction. Backup suites and deduplication systems integrate tar with tools like rsync for incremental distribution, and container ecosystems (for example, Docker (software) images) may use tar streams during image layer assembly and export.
Notable implementations include GNU tar from the GNU Project, the original tar from Bell Labs and later Posix-compliant utilities found in NetBSD and FreeBSD, and Windows ports provided by projects like Cygwin and MSYS2. Commercial and embedded systems sometimes supply custom implementations optimized for constrained environments or specific tape controllers. Variants extend functionality with features such as incremental snapshots, sparse file handling, and archive verification; examples are present in system utilities distributed with AIX, Solaris (operating system), and other enterprise UNIX derivatives.
Tar archives can contain absolute paths, path traversal entries, and special device nodes that pose security risks when extracted with elevated privileges. Archive extraction on systems like Linux and macOS should be performed with caution, using flags that restrict permissions or transform member names to avoid overwriting critical files. Historically, format limitations included filename length caps and platform-specific metadata mismatches; pax and later extensions mitigated many issues, but interoperability pitfalls remain between implementations. Malformed or malicious archives can exploit parser bugs in particular tar programs, leading to vulnerabilities tracked and fixed by maintainers in projects such as the GNU Project and downstream distributions including Debian and Red Hat Enterprise Linux.