Generated by DeepSeek V3.2| AWK | |
|---|---|
![]() | |
| Name | AWK |
| Paradigm | Scripting language, Data-driven programming |
| Designer | Alfred Aho, Peter Weinberger, Brian Kernighan |
| Developer | Bell Labs |
| Released | 1977 |
| Typing | Dynamic typing, Weak typing |
| Influenced | Perl, GNU Awk, Tcl |
AWK. AWK is a domain-specific programming language designed for text processing and data extraction, created at Bell Labs in the 1970s. It is a standard feature of most Unix-like operating systems and is renowned for its concise syntax for pattern scanning and processing. The language's name is derived from the surnames of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan.
The language was developed at Bell Labs as part of the Unix operating system's toolkit, following the success of tools like grep and sed. Its primary design is centered around processing structured, text-based data, such as log files or output from other commands, using a pattern-action paradigm. This makes it exceptionally powerful for rapid prototyping, report generation, and transforming data formats in pipelines. The original implementation was part of the Seventh Edition Unix, and its utility ensured its inclusion in the POSIX standard.
An AWK program consists of a series of pattern-action statements written as `pattern { action }`. The input is read line-by-line, with each line automatically split into fields accessible via dollar sign variables like `$1`. Key syntactic elements include the use of regular expressions for pattern matching, C programming language-style control flow with `if`, `while`, and `for`, and associative arrays for data storage. Special patterns like `BEGIN` and `END` allow for pre-processing and post-processing actions, executed before the first input line is read and after the last, respectively.
The language provides numerous built-in variables that control execution, such as FS (field separator), OFS (output field separator), NR (number of records), and NF (number of fields in the current record). Essential built-in functions include arithmetic operations like `sqrt()` and `rand()`, string manipulation functions such as `sub()`, `gsub()`, and `split()`, and input/output operations like `printf` for formatted output. These features minimize the need for external libraries for common tasks, allowing succinct one-liners directly on the command-line interface.
Typical applications include parsing CSV files, summarizing system administration logs, and generating formatted reports from raw data. For instance, to print the first field of lines containing "error" from a file, one might use: `awk '/error/ { print $1 }' logfile`. Another classic example is summing a column of numbers: `awk '{ sum += $1 } END { print sum }' data.txt`. Its ability to be combined with other Unix utilities like sort and uniq in shell scripts makes it a cornerstone of data munging and ETL processes.
Several enhanced versions exist, with GNU Awk (gawk) being the most prominent, adding features like network access, time functions, and XML parsing. Other notable variants include nawk (new awk), developed for System V Release 3.1, and mawk, a fast, lightweight implementation by Mike Brennan. The language has also influenced the design of later scripting tools, most notably Perl, and its core concepts are embedded in specialized data analysis environments like the J programming language.
Category:Programming languages Category:Unix software Category:Scripting languages Category:Bell Labs