Generated by GPT-5-mini| tar (computing) | |
|---|---|
| Name | tar |
| Author | Ken Thompson |
| Developer | GNU Project, POSIX working groups |
| Released | 1979 |
| Operating system | Unix-like |
| Genre | Archiver |
| License | Various (GNU GPL for GNU tar) |
tar (computing) is a computer program for collecting many files into a single archive file, often referred to as a tarball, and for extracting files from such archives. It originated on early Unix systems and became a de facto standard for software distribution on BSD and System V derivatives, later formalized by POSIX and reimplemented by the GNU Project and many vendors. tar archives are widely used across Linux, macOS, and embedded Unix-like environments for backup, distribution, and packaging of software releases.
tar creates and manipulates concatenated file archives that preserve filesystem metadata such as permissions, ownership, timestamps, and directory structure. The format was designed for tape storage devices used in AT&T Bell Labs environments and later adapted for disk and network transfer. tar archives are commonly combined with compression utilities like gzip, bzip2, xz, or compress to form compressed distributions used by projects such as Debian, Red Hat, Homebrew, and CPAN.
tar traces its origins to the early Unix research at Bell Labs in the late 1970s, with contributions from developers associated with the original AT&T Unix team. The utility was incorporated into the Sequent and SunOS toolchains and standardized in the POSIX.1-1988 specifications. Over time, different implementations emerged, including the GNU tar maintained by the GNU Project led by contributors collaborating with Free Software Foundation. Other notable implementers include vendor teams from IBM, HP, Oracle, and the NetBSD and OpenBSD projects.
tar archives store a sequence of file entries, each represented by a 512-byte header block followed by file data padded to 512-byte boundaries. The header contains metadata fields for filename, mode, owner uid/gid, file size, mtime, checksum, typeflag, and linkname, and may include UStar or POSIX.1-2001 extensions to support longer names and additional attributes. The format supports special entries for directories, symbolic links, device nodes, and sparse files often used by Oracle Linux-based backup tools. Long filename and metadata extensions are defined in the UStar and GNU tar formats; interoperability relies on conventions adopted by standards bodies such as IEEE and The Open Group.
Common tar operations include creating (c), listing (t), extracting (x), and appending (r) archives, typically invoked with options and an archive filename. Typical flags used across implementations include -c, -x, -t, -f, -v for verbose output, -z to filter through gzip, -j for bzip2, and -J for xz; GNU tar adds --transform and --exclude patterns useful in Autoconf and CMake build systems. Integration with package management and build infrastructures such as RPM, dpkg, Autotools, and GNU Make often requires precise option handling and compatibility with POSIX utilities like sed and awk.
Multiple implementations exist: the GNU tar distributed by the GNU Project; BSD tar variants in FreeBSD, NetBSD, and OpenBSD; platform-specific versions from Solaris and AIX; and libarchive-based tools used in macOS and pkgsrc. Compatibility challenges arise from differences in handling extended attributes (xattrs), access control lists (ACLs), sparse files, and SELinux contexts as implemented by Red Hat Enterprise Linux, CentOS, and SUSE Linux Enterprise Server. Many projects rely on the GNU tar behavior; container ecosystems such as Docker and orchestration platforms like Kubernetes use tar for image layer construction and export/import workflows.
tar archives can contain entries with absolute paths or parent-directory references ("..") that may lead to path traversal during extraction, a class of vulnerability exploited by poorly configured unpackers. Implementations and packaging tools often provide safeguards like --strip-components, --absolute-names, or sandboxed extraction environments as practiced in OpenJDK build farms and continuous integration systems such as Jenkins and GitLab CI/CD. Other limitations include lack of built-in compression, leading to reliance on external compressors, and format constraints that complicate preserving extended metadata across disparate filesystems such as NTFS and FAT32. Security hardening efforts by organizations like Mozilla and Debian recommend checksum verification, detached signatures using GnuPG, and reproducible build techniques championed by Reproducible Builds project.
Typical examples: creating an archive of a source tree used by Autoconf and Automake projects: - tar -czf project.tar.gz project/ Extracting a compressed archive: - tar -xzf project.tar.gz --strip-components=1 Appending files for incremental backup combined with rsync: - tar -rf backup.tar incremental/ && gzip backup.tar Packaging for distribution: many GNU packages provide a compressed tarball for release; container images use tar streams for layer import/export in Docker and OCI workflows. Build pipelines in Travis CI and CircleCI routinely use tar with compression filters to cache artifacts and deploy release archives to mirrors hosted by GitHub, SourceForge, and Launchpad.
Category:Archivers