LLMpediaThe first transparent, open encyclopedia generated by LLMs

ZIP (file format)

Generated by DeepSeek V3.2
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Apple Pages Hop 4
Expansion Funnel Raw 68 → Dedup 36 → NER 24 → Enqueued 24
1. Extracted68
2. After dedup36 (None)
3. After NER24 (None)
Rejected: 12 (not NE: 12)
4. Enqueued24 (None)
ZIP (file format)
NameZIP
Extension.zip
Mimeapplication/zip
Uniform typecom.pkware.zip-archive
MagicPK\x03\x04
DeveloperPhil Katz, PKWARE
Released14 February 1989
GenreArchive format, Data compression
StandardAPPNOTE from PKWARE

ZIP (file format). The ZIP file format is a widely used archive format that employs lossless data compression to store one or more files within a single container. Originally created by Phil Katz for the PKZIP utility, it has become a ubiquitous standard for bundling and compressing digital data. Its design incorporates a central directory at the end of the archive, enabling software to quickly list contents without reading the entire file. The format's specification is maintained and published by PKWARE in a document known as the APPNOTE.

History

The format's development was a direct response to the legal and technical limitations of the earlier ARC (file format) utility created by System Enhancement Associates. After a legal dispute, Phil Katz founded PKWARE and released the first version of PKZIP in 1989. The format quickly gained popularity on BBS systems and within the MS-DOS community due to its superior compression ratios and speed compared to contemporaries like LHARC. A pivotal moment in its proliferation was its integration into the Windows operating system, starting with the Microsoft Plus! for Windows 95 pack. The format's dominance was further cemented when Sun Microsystems made it a standard part of the Java Platform with the `java.util.zip` package.

Technical design

A ZIP file is structured as a collection of compressed data objects, each preceded by a local file header. Crucially, the format places a central directory at the end of the archive, which serves as a table of contents containing metadata like filenames and offsets. This design, inspired by the DFS (file system) approach, allows for efficient random access. Compression within a ZIP file is applied per file, using algorithms such as the original DEFLATE method, with support for others like BZIP2 and LZMA through the specification's extensibility. The file header begins with a specific magic number (`PK`), identifying the file type to utilities like the GNU Project's Info-ZIP or the 7-Zip archiver.

Software support

Native support for the format is extensive across all major operating systems. Microsoft integrated basic ZIP handling into Windows Explorer starting with Windows XP, leveraging libraries from Mountain View, California-based Info-ZIP. On macOS, the Apple Archive Utility provides built-in functionality. In the Unix-like ecosystem, utilities like Info-ZIP and gzip are commonplace, while the KDE and GNOME desktop environments include graphical front-ends. Major software applications, including Adobe Photoshop and Microsoft Office, use the format for their document packages (e.g., DOCX). The Libarchive library provides cross-platform support for numerous formats including ZIP.

Security

While convenient, the ZIP format has been associated with several security vulnerabilities over its history. A major class of attacks involves archive bombs, such as the 42.zip file, which decompress to petabytes of data. The "zip slip" vulnerability, which involves path traversal through specially crafted archive entries, has affected projects like the Apache Software Foundation's Apache Ant and Snyk. Furthermore, the traditional ZIP cryptography encryption (ZipCrypto) is considered weak and susceptible to known-plaintext attacks. In response, PKWARE introduced the stronger AES encryption option, now supported by tools like WinZip and 7-Zip. Security firms like Kaspersky Lab routinely detect malware distributed within ZIP archives.

Several formats are directly derived from or closely related to the ZIP specification. The JAR (file format) used by the Java Platform is essentially a ZIP archive with a specific manifest structure. Similarly, Microsoft's Office Open XML formats (DOCX, XLSX, PPTX) are ZIP containers holding XML and media files. The OpenDocument standard, used by Apache OpenOffice and LibreOffice, also employs a ZIP-based container. Other archive formats that compete with or complement ZIP include the open 7z format from 7-Zip, the older TAR (file format), and the proprietary RAR (file format) created by Eugene Roshal.

Category:Archive formats Category:Computer file formats Category:Data compression