Most awk variables are available for you to use for your own
purposes; they never change except when your program assigns values to
them, and never affect anything except when your program examines them.
However, a few variables in awk have special built-in meanings.
Some of them awk examines automatically, so that they enable you
to tell awk how to do certain things. Others are set
automatically by awk, so that they carry information from the
internal workings of awk to your program.
This chapter documents all the built-in variables of gawk. Most
of them are also documented in the chapters describing their areas of
activity.
awk
This is an alphabetical list of the variables which you can change to
control how awk does certain things. Those variables that are
specific to gawk are marked with an asterisk, `*'.
CONVFMT
sprintf function
(see section Built-in Functions for String Manipulation).
Its default value is "%.6g".
CONVFMT was introduced by the POSIX standard.
FIELDWIDTHS *
gawk
how to split input with fixed, columnar boundaries. It is an
experimental feature. Assigning to FIELDWIDTHS
overrides the use of FS for field splitting.
See section Reading Fixed-width Data, for more information.
If gawk is in compatibility mode
(see section Command Line Options), then FIELDWIDTHS
has no special meaning, and field splitting operations are done based
exclusively on the value of FS.
FS
FS is the input field separator
(see section Specifying How Fields are Separated).
The value is a single-character string or a multi-character regular
expression that matches the separations between fields in an input
record. If the value is the null string (""), then each
character in the record becomes a separate field.
The default value is " ", a string consisting of a single
space. As a special exception, this value means that any
sequence of spaces, tabs, and/or newlines is a single separator.(8) It also causes
spaces, tabs, and newlines at the beginning and end of a record to be ignored.
You can set the value of FS on the command line using the
`-F' option:
awk -F, 'program' input-filesIf
gawk is using FIELDWIDTHS for field-splitting,
assigning a value to FS will cause gawk to return to
the normal, FS-based, field splitting. An easy way to do this
is to simply say `FS = FS', perhaps with an explanatory comment.
IGNORECASE *
IGNORECASE is non-zero or non-null, then all string comparisons,
and all regular expression matching are case-independent. Thus, regexp
matching with `~' and `!~', and the gensub,
gsub, index, match, split and sub
functions, record termination with RS, and field splitting with
FS all ignore case when doing their particular regexp operations.
The value of IGNORECASE does not affect array subscripting.
See section Case-sensitivity in Matching.
If gawk is in compatibility mode
(see section Command Line Options),
then IGNORECASE has no special meaning, and string
and regexp operations are always case-sensitive.
OFMT
print statement. It works by being passed, in
effect, as the first argument to the sprintf function
(see section Built-in Functions for String Manipulation).
Its default value is "%.6g". Earlier versions of awk
also used OFMT to specify the format for converting numbers to
strings in general expressions; this is now done by CONVFMT.
OFS
print statement. Its
default value is " ", a string consisting of a single space.
ORS
print statement. Its default value is "\n".
(See section Output Separators.)
RS
awk's input record separator. Its default value is a string
containing a single newline character, which means that an input record
consists of a single line of text.
It can also be the null string, in which case records are separated by
runs of blank lines, or a regexp, in which case records are separated by
matches of the regexp in the input text.
(See section How Input is Split into Records.)
SUBSEP
SUBSEP is the subscript separator. It has the default value of
"\034", and is used to separate the parts of the indices of a
multi-dimensional array. Thus, the expression foo["A", "B"]
really accesses foo["A\034B"]
(see section Multi-dimensional Arrays).
This is an alphabetical list of the variables that are set
automatically by awk on certain occasions in order to provide
information to your program. Those variables that are specific to
gawk are marked with an asterisk, `*'.
ARGC
ARGV
awk programs are stored in
an array called ARGV. ARGC is the number of command-line
arguments present. See section Other Command Line Arguments.
Unlike most awk arrays,
ARGV is indexed from zero to ARGC - 1. For example:
$ awk 'BEGIN {
> for (i = 0; i < ARGC; i++)
> print ARGV[i]
> }' inventory-shipped BBS-list
-| awk
-| inventory-shipped
-| BBS-list
In this example, ARGV[0] contains "awk", ARGV[1]
contains "inventory-shipped", and ARGV[2] contains
"BBS-list". The value of ARGC is three, one more than the
index of the last element in ARGV, since the elements are numbered
from zero.
The names ARGC and ARGV, as well as the convention of indexing
the array from zero to ARGC - 1, are derived from the C language's
method of accessing command line arguments.
See section Using ARGC and ARGV, for information
about how awk uses these variables.
ARGIND *
ARGV of the current file being processed.
Every time gawk opens a new data file for processing, it sets
ARGIND to the index in ARGV of the file name.
When gawk is processing the input files, it is always
true that `FILENAME == ARGV[ARGIND]'.
This variable is useful in file processing; it allows you to tell how far
along you are in the list of data files, and to distinguish between
successive instances of the same filename on the command line.
While you can change the value of ARGIND within your awk
program, gawk will automatically set it to a new value when the
next file is opened.
This variable is a gawk extension. In other awk implementations,
or if gawk is in compatibility mode
(see section Command Line Options),
it is not special.
ENVIRON
ENVIRON["HOME"] might be `/home/arnold'. Changing this array
does not affect the environment passed on to any programs that
awk may spawn via redirection or the system function.
(In a future version of gawk, it may do so.)
Some operating systems may not have environment variables.
On such systems, the ENVIRON array is empty (except for
ENVIRON["AWKPATH"]).
ERRNO *
getline,
during a read for getline, or during a close operation,
then ERRNO will contain a string describing the error.
This variable is a gawk extension. In other awk implementations,
or if gawk is in compatibility mode
(see section Command Line Options),
it is not special.
FILENAME
awk is currently reading.
When no data files are listed on the command line, awk reads
from the standard input, and FILENAME is set to "-".
FILENAME is changed each time a new file is read
(see section Reading Input Files).
Inside a BEGIN rule, the value of FILENAME is
"", since there are no input files being processed
yet.(9) (d.c.)
FNR
FNR is the current record number in the current file. FNR is
incremented each time a new record is read
(see section Explicit Input with getline). It is reinitialized
to zero each time a new input file is started.
NF
NF is the number of fields in the current input record.
NF is set each time a new record is read, when a new field is
created, or when $0 changes (see section Examining Fields).
NR
awk has processed since
the beginning of the program's execution
(see section How Input is Split into Records).
NR is set each time a new record is read.
RLENGTH
RLENGTH is the length of the substring matched by the
match function
(see section Built-in Functions for String Manipulation).
RLENGTH is set by invoking the match function. Its value
is the length of the matched string, or -1 if no match was found.
RSTART
RSTART is the start-index in characters of the substring matched by the
match function
(see section Built-in Functions for String Manipulation).
RSTART is set by invoking the match function. Its value
is the position of the string where the matched substring starts, or zero
if no match was found.
RT *
RT is set each time a record is read. It contains the input text
that matched the text denoted by RS, the record separator.
This variable is a gawk extension. In other awk implementations,
or if gawk is in compatibility mode
(see section Command Line Options),
it is not special.
A side note about NR and FNR.
awk simply increments both of these variables
each time it reads a record, instead of setting them to the absolute
value of the number of records read. This means that your program can
change these variables, and their new values will be incremented for
each record (d.c.). For example:
$ echo '1
> 2
> 3
> 4' | awk 'NR == 2 { NR = 17 }
> { print NR }'
-| 1
-| 17
-| 18
-| 19
Before FNR was added to the awk language
(see section Major Changes between V7 and SVR3.1),
many awk programs used this feature to track the number of
records in a file by resetting NR to zero when FILENAME
changed.
ARGC and ARGV
In section Built-in Variables that Convey Information,
you saw this program describing the information contained in ARGC
and ARGV:
$ awk 'BEGIN {
> for (i = 0; i < ARGC; i++)
> print ARGV[i]
> }' inventory-shipped BBS-list
-| awk
-| inventory-shipped
-| BBS-list
In this example, ARGV[0] contains "awk", ARGV[1]
contains "inventory-shipped", and ARGV[2] contains
"BBS-list".
Notice that the awk program is not entered in ARGV. The
other special command line options, with their arguments, are also not
entered. But variable assignments on the command line are
treated as arguments, and do show up in the ARGV array.
Your program can alter ARGC and the elements of ARGV.
Each time awk reaches the end of an input file, it uses the next
element of ARGV as the name of the next input file. By storing a
different string there, your program can change which files are read.
You can use "-" to represent the standard input. By storing
additional elements and incrementing ARGC you can cause
additional files to be read.
If you decrease the value of ARGC, that eliminates input files
from the end of the list. By recording the old value of ARGC
elsewhere, your program can treat the eliminated arguments as
something other than file names.
To eliminate a file from the middle of the list, store the null string
("") into ARGV in place of the file's name. As a
special feature, awk ignores file names that have been
replaced with the null string.
You may also use the delete statement to remove elements from
ARGV (see section The delete Statement).
All of these actions are typically done from the BEGIN rule,
before actual processing of the input begins.
See section Splitting a Large File Into Pieces, and see
section Duplicating Output Into Multiple Files, for an example
of each way of removing elements from ARGV.
The following fragment processes ARGV in order to examine, and
then remove, command line options.
BEGIN {
for (i = 1; i < ARGC; i++) {
if (ARGV[i] == "-v")
verbose = 1
else if (ARGV[i] == "-d")
debug = 1
else if (ARGV[i] ~ /^-?/) {
e = sprintf("%s: unrecognized option -- %c",
ARGV[0], substr(ARGV[i], 1, ,1))
print e > "/dev/stderr"
} else
break
delete ARGV[i]
}
}
To actually get the options into the awk program, you have to
end the awk options with `--', and then supply your options,
like so:
awk -f myprog -- -v -d file1 file2 ...
This is not necessary in gawk: Unless `--posix' has been
specified, gawk silently puts any unrecognized options into
ARGV for the awk program to deal with.
As soon as it
sees an unknown option, gawk stops looking for other options it might
otherwise recognize. The above example with gawk would be:
gawk -f myprog -d -v file1 file2 ...
Since `-d' is not a valid gawk option, the following `-v'
is passed on to the awk program.
Go to the first, previous, next, last section, table of contents.