sed & awk

sed & awkSearch this book
Previous: 8.6 System Variables That Are ArraysChapter 9Next: 9.2 String Functions
 

9. Functions

Contents:
Arithmetic Functions
String Functions
Writing Your Own Functions

A function is a self-contained computation that accepts a number of arguments as input and returns some value. Awk has a number of built-in functions in two groups: arithmetic and string functions. Awk also provides user-defined functions, which allow you to expand upon the built-in functions by writing your own.

9.1 Arithmetic Functions

Nine of the built-in functions can be classified as arithmetic functions. Most of them take a numeric argument and return a numeric value. Table 9.1 summarizes these arithmetic functions.

Table 9.1: awk's Built-In Arithmetic Functions
Awk FunctionDescription
cos(x)Returns cosine of x (x is in radians).
exp(x)Returns e to the power x.
int(x)Returns truncated value of x.
log(x)Returns natural logarithm (base-e) of x.
sin(x)Returns sine of x (x is in radians).
sqrt(x)Returns square root of x.
atan2(y,x)Returns arctangent of y/x in the range -[pi] to [pi].
rand()Returns pseudo-random number r, where 0 <= r < 1.
srand(x)

Establishes new seed for rand(). If no seed is specified, uses time of day. Returns the old seed.

9.1.1 Trigonometric Functions

The trigonometric functions cos() and sin() work the same way, taking a single argument that is the size of an angle in radians and returning the cosine or sine for that angle. (To convert from degrees to radians, multiply the number by [pi]/180.) The trigonometric function atan2() takes two arguments and returns the arctangent of their quotient. The expression

atan2(0, -1)

produces [pi].

The function exp() uses the natural exponential, which is also known as base-e exponentiation. The expression

exp(1)

returns the natural number 2.71828, the base of the natural logarithms, referred to as e. Thus, exp(x) is e to the x-th power.

The log() function gives the inverse of the exp() function, the natural logarithm of x. The sqrt() function takes a single argument and returns the (positive) square root of that argument.

9.1.2 Integer Function

The int() function truncates a numeric value by removing digits to the right of the decimal point. Look at the following two statements:

print 100/3
print int(100/3)

The output from these statements is shown below:

33.3333
33

The int() function simply truncates; it does not round up or down. (Use the printf format "%.0f" to perform rounding.)[1]

[1] The way printf does rounding is discussed in Appendix B, Quick Reference for awk.

9.1.3 Random Number Generation

The rand() function generates a pseudo-random floating-point number between 0 and 1. The srand() function sets the seed or starting point for random number generation. If srand() is called without an argument, it uses the time of day to generate the seed. With an argument x, srand() uses x as the seed.

If you don't call srand() at all, awk acts as if srand() had been called with a constant argument before your program started, causing you to get the same starting point every time you run your program. This is useful if you want reproducible behavior for testing, but inappropriate if you really do want your program to behave differently every time. Look at the following script:

# rand.awk -- test random number generation
BEGIN {
	print rand()
	print rand()
	srand()
	print rand()
	print rand()
}

We print the result of the rand() function twice, and then call the srand() function before printing the result of the rand() function two more times. Let's run the script.

$ awk -f rand.awk
0.513871
0.175726
0.760277
0.263863

Four random numbers are generated. Now look what happens when we run the program again:

$ awk -f rand.awk
0.513871
0.175726
0.787988
0.305033

The first two "random" numbers are identical to the numbers generated in the previous run of the program while the last two numbers are different. The last two numbers are different because we provided the rand() function with a new seed.

The return value of the srand() function is the seed it was using. This can be used to keep track of sequences of random numbers, and re-run them if needed.

9.1.4 Pick 'em

To show how to use rand(), we'll look at a script that implements a "quick-pick" for a lottery game. This script, named lotto, picks x numbers from a series of numbers 1 to y. Two arguments can be supplied on the command line: how many numbers to pick (the default is 6) and the highest number in the series (the default is 30). Using the default values for x and y, the script generates six unique random numbers between 1 and 30. The numbers are sorted for readability from lowest to highest and output. Before looking at the script itself, let's run the program:

$ lotto
Pick 6 of 30
9 13 25 28 29 30
$ lotto 7 35
Pick 7 of 35
1 6 9 16 20 22 27

The first example uses the default values to print six random numbers from 1 to 30. The second example prints seven random numbers out of 35.

The full lotto script is fairly complicated, so before looking at the entire script, let's look at a smaller script that generates a single random number in a series:

awk -v TOPNUM=$1 '
# pick1 - pick one random number out of y 
# main routine
BEGIN {
# seed random number using time of day 
	srand() 
# get a random number
	select = 1 + int(rand() * TOPNUM)
# print pick
	print select
}'

The shell script expects a single argument from the command line and this is passed into the program as "TOPNUM=$1," using the -v option. All the action happens in the BEGIN procedure. Since there are no other statements in the program, awk exits when the BEGIN procedure is done.

The main routine first calls the srand() function to seed the random number generator. Then we get a random number by calling the rand() function:

select = 1 + int(rand() * TOPNUM)

It might be helpful to see this expression broken up so each part of it is obvious.

StatementResult
print r = rand()0.467315
print r * TOPNUM14.0195
print int(r * TOPNUM)14
print 1 + int(r * TOPNUM)15

Because the rand() function returns a number between 0 and 1, we multiply it by TOPNUM to get a number between 0 and TOPNUM. We then truncate the number to remove the fractional values and then add 1 to the number. The latter is necessary because rand() could return 0. In this example, the random number that is generated is 15. You could use this program to print any single number, such as picking a number between 1 and 100.

$ pick1 100
83

The lotto script must "pick one" multiple times. Basically, we need to set up a for loop to execute the rand() function as many times as needed. One of the reasons this is difficult is that we have to worry about duplicates. In other words, it is possible for a number to be picked again; therefore we have to keep track of the numbers already picked.

Here's the lotto script:

awk -v NUM=$1 -v TOPNUM=$2 '
# lotto - pick x random numbers out of y 
# main routine
BEGIN {
# test command line args; NUM = $1, how many numbers to pick 
# 	              TOPNUM = $2, last number in series
	if (NUM <= 0) 
		NUM = 6
	if (TOPNUM <= 0) 
		TOPNUM = 30
# print "Pick x of y"
	printf("Pick %d of %d\n", NUM, TOPNUM) 
# seed random number using time and date; do this once
	srand() 
# loop until we have NUM selections
	for (j = 1; j <= NUM; ++j) {
		# loop to find a not-yet-seen selection
		do {
			select = 1 + int(rand() * TOPNUM)
		} while (select in pick)
		pick[select] = select
	}
# loop through array and print picks.
	for (j in pick) 
		printf("%s ", pick[j])
	printf("\n")
}'

Unlike the previous program, this one looks for two command-line arguments, indicating x numbers out of y. The main routine looks to see if these numbers were supplied and if not, assigns default values.

There is only one array, pick, for holding the random numbers that are selected. Each number is guaranteed to be in the desired range, because the result of rand() (a value between 0 and 1) is multiplied by TOPNUM and then truncated. The heart of the script is a loop that occurs NUM times to assign NUM elements to the pick array.

To get a new non-duplicate random number, we use an inner loop that generates selections and tests to see if they are in the pick array. (Using the in operator is much faster than looping through the array comparing subscripts.) While (select in pick), the corresponding element has been found already, so the selection is a duplicate and we reject the selection. If it is not true that select in pick, then we assign select to an element of the pick array. This will make future in tests return true, causing the do loop to continue.

Finally, the program loops through the pick array and prints the elements. This version of the lotto script leaves one thing out. See if you can tell what it is if we run it again:

$ lotto 7 35
Pick 7 of 35
5 21 9 30 29 20 2

That's right, the numbers are not sorted. We'll defer showing the code for the sort routine until we discuss user-defined functions. While it's not necessary to have written the sorting code as a function, it makes a lot of sense. One reason is that you can tackle a more generalized problem and retain the solution for use in other programs. Later on, we will write a function that sorts the elements of an array.

Note that the pick array isn't ready for sorting, since its indices are the same as its values, not numbers in order. We would have to set up a separate array for sorting by our sort function:

# create a numerically indexed array for sorting
i = 1
for (j in pick)
	sortedpick[i++] = pick[j]

The lotto program is set up to do everything in the BEGIN block. No input is processed. You could, however, revise this script to read a list of names from a file and for each name generate a "quick-pick."


Previous: 8.6 System Variables That Are Arrayssed & awkNext: 9.2 String Functions
8.6 System Variables That Are ArraysBook Index9.2 String Functions

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System