Contents:
The getline Function
The close() Function
The system() Function
A Menu-Based Command Generator
Directing Output to Files and Pipes
Generating Columnar Reports
Debugging
Limitations
Invoking awk Using the #! Syntax
This chapter is proof that not everything has its place. Some things just don't seem to fit, no matter how you organize them. This chapter is a collection of such things. It is tempting to label it "Advanced Topics," as if to explain its organization (or lack thereof), but some readers might feel they need to make more progress before reading it. We have therefore called it "The Bottom Drawer," thinking of the organization of a chest of drawers, with underwear, socks, and other day-to-day things in the top drawers and heavier things that are less frequently used, like sweaters, in the bottom drawers. All of it is equally accessible, but you have to bend over to get things in the bottom drawer. It requires a little more effort to get something, that's all.
In this chapter we cover a number of topics, including the following:
The getline function
The system() function
Directing output to files and pipes
Debugging awk scripts
The getline function is used to read another line of input. Not only can getline read from the regular input data stream, it can also handle input from files and pipes.
The getline function is similar to awk's next statement. While both cause the next input line to be read, the next statement passes control back to the top of the script. The getline function gets the next line without changing control in the script. Possible return values are:
1 | If it was able to read a line. |
0 | If it encounters the end-of-file. |
-1 | If it encounters an error. |
NOTE: Although getline is called a function and it does return a value, its syntax resembles a statement. Do not write getline(); its syntax does not permit parentheses.
In the previous chapter, we used a manual page source file as an example. The -man macros typically place the text argument on the next line. Although the macro is the pattern that you use to find the line, it is actually the next line that you process. For instance, to extract the name of the command from the manpage, the following example matches the heading "Name," reads the next line, and prints the first field of it:
# getline.awk -- test getline function /^\.SH "?Name"?/ { getline # get next line print $1 # print $1 of new line. }
The pattern matches any line with ".SH" followed by "Name," which might be enclosed in quotes. Once this line is matched, we use getline to read the next input line. When the new line is read, getline assigns it $0 and parses it into fields. The system variables NF, NR, and FNR are also set. Thus, the new line becomes the current line, and we are able to refer to "$1" and retrieve the first field. Note that the previous line is no longer available as $0. However, if necessary, you can assign the line read by getline to a variable and avoid changing $0, as we'll see shortly.
Here's an example that shows how the previous script works, printing out the first field of the line following ".SH Name."
$awk -f getline.awk test
XSubImage
The sorter.awk program that we demonstrated at the end of Chapter 9, Functions, could have used getline to read all the lines after the heading "Related Commands." We can test the return value of getline in a while loop to read a number of lines from the input. The following procedure replaces the first two procedures in the sorter program:
# Match "Related Commands" and collect them /^\.SH "?Related Commands"?/ { print while (getline > 0) commandList = commandList $0 }
The expression "getline > 0" will be true as long as getline successfully reads an input line. When it gets to the end-of-file, getline returns 0 and the loop is exited.
Besides reading from the regular input stream, the getline function allows you to read input from a file or a pipe. For instance, the following statement reads the next line from the file data:
getline < "data"
Although the filename can be supplied through a variable, it is typically specified as a string constant, which must be enclosed in quotes. The symbol "<" is the same as the shell's input redirection symbol and will not be interpreted as the "less than" symbol. We can use a while loop to read all the lines from a file, testing for an end-of-file to exit the loop. The following example opens the file data and prints all of its lines:
while ( (getline < "data") > 0 ) print
(We parenthesize to avoid confusion; the "<" is a redirection, while the ">" is a comparison of the return value.) The input can also come from standard input. You can use getline following a prompt for the user to enter information:
BEGIN { printf "Enter your name: " getline < "-" print }
This sample code prints the prompt "Enter your name:" (printf is used because we don't want a carriage return after the prompt), and then calls getline to gather the user's response.[1] The response is assigned to $0, and the print statement outputs that value.
[1] At least at one time, SGI versions of nawk did not support the use of "-" with getline to read from standard input. Caveat emptor.
The getline function allows you to assign the input record to a variable. The name of the variable is supplied as an argument. Thus, the following statement reads the next line of input into the variable input:
getline input
Assigning the input to a variable does not affect the current input line; that is, $0 is not affected. The new input line is not split into fields, and thus the variable NF is also unaffected. It does increment the record counters, NR and FNR.
The previous example demonstrated how to prompt the user. That example could be written as follows, assigning the user's response to the variable name.
BEGIN { printf "Enter your name: " getline name < "-" print name }
Study the syntax for assigning the input data to a variable because it is a common mistake to instead write:
name = getline # wrong
which assigns the return value of getline to the variable name.
You can execute a command and pipe the output into getline. For example, look at the following expression:
"who am i" | getline
That expression sets "$0" to the output of the who am i command.
dale ttyC3 Jul 18 13:37
The line is parsed into fields and the system variable NF is set. Similarly, you can assign the result to a variable:
"who am i" | getline me
By assigning the output to a variable, you avoid setting $0 and NF, but the line is not split into fields.
The following script is a fairly simple example of piping the output of a command to getline. It uses the output from the who am i command to get the user's name. It then looks up the name in /etc/passwd, printing out the fifth field of that file, the user's full name:
awk '# getname - print users fullname from /etc/passwd BEGIN { "who am i" | getline name = $1 FS = ":" } name ~ $1 { print $5 } ' /etc/passwd
The command is executed from the BEGIN procedure, and it provides us with the name of the user that will be used to find the user's entry in /etc/passwd. As explained above, who am i outputs a single line, which getline assigns to $0. $1, the first field of that output, is then assigned to name.
The field separator is set to a colon (:) to allow us to access individual fields in entries in the /etc/passwd file. Notice that FS is set after getline or else the parsing of the command's output would be affected.
Finally, the main procedure is designed to test that the first field matches name. If it does, the fifth field of the entry is printed. For instance, when Dale runs this script, it prints "Dale Dougherty."
When the output of a command is piped to getline and it contains multiple lines, getline reads a line at a time. The first time getline is called it reads the first line of output. If you call it again, it reads the second line. To read all the lines of output, you must set up a loop that executes getline until there is no more output. For instance, the following example uses a while loop to read each line of output and assign it to the next element of the array, who_out:
while ("who" | getline) who_out[++i] = $0
Each time the getline function is called, it reads the next line of output. The who command, however, is executed only once.
The next example looks for "@date" in a document and replaces it with today's date:
# subdate.awk -- replace @date with todays date /@date/ { "date +'%a., %h %d, %Y'" | getline today gsub(/@date/, today) } { print }
The date command, using its formatting options,[2] provides the date and getline assigns it to the variable today. The gsub() function replaces each instance of "@date" with today's date.
[2] Older versions of date don't support formatting options. Particularly the one on SunOS 4.1.x systems; there you have to use /usr/5bin/date. Check your local documentation.
This script might be used to insert the date in a form letter:
To: Peabody From: Sherman Date: @date I am writing you on @date to remind you about our special offer.
All lines of the input file would be passed through as is, except the lines containing "@date", which are replaced with today's date:
$awk -f subdate.awk subdate.test
To: Peabody From: Sherman Date: Sun., May 05, 1996 I am writing you on Sun., May 05, 1996 to remind you about our special offer.