cut (Unix)

cut (Unix)
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	cut
Developer	Multics Team; Bell Labs; GNU Project
Operating system	Unix; Unix-like; Linux; BSD; macOS; Solaris
Genre	Command
License	Various (proprietary; permissive; copyleft)

Contents

History
Synopsis
Options and Usage
Examples
Implementation and Portability
Related Utilities and Comparisons

cut (Unix)

cut is a command-line utility for extracting sections from lines of text files, implemented in Unix and Unix-like systems. It is used in pipelines and shell scripts alongside programs like sed, awk, sort, uniq, and tr. The utility was introduced as part of early Multics-inspired tools at Bell Labs and later standardized in POSIX and implemented by projects such as the GNU Project and the BSD variants.

History

cut traces its ancestry to text-processing tools developed at Bell Labs concurrent with the creation of the Unix operating system and influenced by concepts from Multics. Early Unix toolchains combined utilities like ed, grep, sed, and cut-like programs to form pipelines used by researchers at institutions such as AT&T Bell Laboratories, Princeton University, MIT, and Stanford University. cut became part of the X/Open and POSIX specifications, which were shaped by organizations including IEEE and The Open Group, and implementations proliferated in System V derivatives, 4.3BSD, the GNU Project coreutils, and proprietary systems like Solaris from Sun Microsystems. Over time, contributions from projects at FreeBSD, NetBSD, and OpenBSD refined portability and locale handling. Academic and engineering texts from ACM and IEEE often mention cut in discussions of Unix philosophy alongside tools like awk and make.

Synopsis

cut reads from standard input or named files and writes selected columns or fields to standard output, functioning in pipelines with programs such as bash and shells like zsh, ksh, and tcsh. It supports byte, character, and field selection modes, which interact with locale-aware encodings like UTF-8 and legacy encodings used on systems from vendors such as IBM and DEC. The utility accepts options defined by POSIX and extended options present in implementations by the GNU Project and BSD maintainers, enabling use in scripting contexts taught in books from O'Reilly Media and university courses at institutions like Carnegie Mellon University and University of California, Berkeley.

Options and Usage

Common options across implementations include field delimiter selection (often -d), field selection with lists or ranges (often -f), byte selection (often -b), character selection (often -c), and the option to suppress lines without delimiters (often -s). These options are documented in manuals produced by projects like the GNU Project, FreeBSD, and NetBSD and in standards set by IEEE. Users combine cut with redirection and pipelines involving find, xargs, grep, and perl for tasks such as CSV manipulation, log processing from servers like Apache HTTP Server and Nginx, and data transformation in toolchains used at organizations like Google and Facebook. Scripting patterns invoking cut appear in tutorials from Stack Overflow, GitHub, and educational material by Linux Foundation and Coursera.

Examples

Common examples demonstrate cutting by delimiter to extract the third field from a colon-separated password file such as system files in Unix System V or BSD-derived systems, mirroring examples in books from Prentice Hall and resources from USENIX. Examples also show extracting character ranges from fixed-width records produced by mainframes from IBM or by legacy tools used at NASA and NOAA. Typical pipelines include combining cut with sort and uniq to aggregate data, or using cut with awk to prefilter columns before more complex transformations. Community-maintained examples appear in repositories and wikis hosted by organizations like GitHub, Stack Overflow, Debian, Red Hat, and Arch Linux.

Implementation and Portability

Implementations of cut vary among coreutils distributions such as the GNU Project coreutils package, BSD distributions including FreeBSD, NetBSD, and OpenBSD, and proprietary systems from Sun Microsystems and historical AT&T. Differences include behavior with multibyte encodings like UTF-8, handling of incomplete multibyte sequences, and interpretation of ranges and lists in option parsing influenced by libc implementations from glibc and musl. Portability notes are discussed in manuals by The Open Group and in portability guides used by projects such as Debian and Gentoo. Performance characteristics across file systems like ext4, XFS, ZFS, and UFS are relevant in high-throughput environments at companies including Twitter, Amazon Web Services, and Netflix.

cut is conceptually related to column-oriented and field-processing tools such as awk, sed, tr, paste, and join. Spreadsheet tools like LibreOffice Calc and Microsoft Excel perform analogous column operations in graphical environments, while data-oriented languages and frameworks such as Python (programming language), Perl, R (programming language), Pandas (software), and Apache Spark provide programmatic alternatives. In the Unix philosophy, cut complements pipeline components found in toolchains used by organizations like NASA, CERN, and academic labs at Caltech and MIT Lincoln Laboratory to create reproducible workflows.

Category:Unix software

History

Synopsis

Options and Usage

Examples

Implementation and Portability

Related Utilities and Comparisons