Data structures

Roger Bailey <rb@doc.ic.ac.uk>
  1. Tuples and lists
  2. Functions that define lists
  3. Data structures as parameters
  4. Pattern-matching
  5. Simplifying expressions
  6. Review

Tuples and lists

Practical programs need data structures and Hope has two standard kinds already built in. The simplest kind corresponds to a Pascal record. We can bind a fixed number of objects of any type together into a structure called a tuple, for example:

( 2, 3 )

or

( 'a', true )

are tuples of type num # num and char # truval respectively. We use tuples when we want a function to define more than one value. Here's one that defines the time of day given the number of seconds since midnight:

dec time24 : num -> num # num # num ;
--- time24 ( s ) <= ( s div 3600,
                 s mod 3600 div 60,
                 s mod 3600 mod 60  ) ;

div is the built-in integer division function and mod gives the remainder after integer division. If we type an application of time24 at the terminal, the result tuple and its type will be printed on the screen in the usual way:

time24 ( 45756 ) ;
( 12,42,36 ) : ( num # num # num )

The second standard data type, called a list, corresponds roughly to a one-dimensional array in Pascal. It can contain any number of objects (including none at all) but they must all be the same type. We can write expressions in our programs that represent lists, such as:

[ 1, 2, 3 ]

which is of type list ( num ). There are two standard functions for defining lists. The infix operator :: (read as cons) defines a list in terms of a single object and list containing the same type of object, so

10 :: [ 20, 30, 40 ]

defines the list:

[ 10, 20, 30, 40 ]

Don't think of :: as adding 10 to the front of [ 20, 30, 40 ]. It really defines a new list [ 10, 20, 30, 40 ] in terms of two other objects without changing their meaning, rather in the same way that 1 + 3 defines a new value of 4 without changing the meaning of 1 or 3.

The other standard list function is nil, which defines a list with no elements in it. We can represent every list by an expression consisting of applications of :: and nil. When we write an expression like:

[ a + 1, b - 2, c * d ]

it's considered to be a shorthand way of writing:

a + 1 :: ( b - 2 :: ( c * d :: nil ) )

There's also a shorthand way of writing lists of characters. The following three expressions are all equivalent:

"cat"
[ 'c', 'a', 't' ]
'c' :: ( 'a' :: ( 't' :: nil ) )

When the result of a Hope program is a list, it's always printed out in the concise bracketed notation; if it's a list of characters, it's printed in quotes.

Every data type in Hope is defined by a set of primitive functions like :: and nil. They're called constructor functions, and aren't defined by recursion equations. When we defined a tuple, we were actually using a standard constructor called , (read as comma). Later on we'll see how constructors are defined for other types of data.

Functions that define lists

If we wanted to write a Pascal program to print the first n natural numbers in descending order we'd probably write a loop that printed one value out on each iteration, such as:

for i := n downto 1 do write ( i ) ;

In Hope we write one expression that defines all the values at once, rather like we did for mult:

dec nats : num -> list ( num ) ;
--- nats ( n ) <= if n = 0 then nil else n :: nats ( n-1 ) ;

nil is useful for writing the base case of a recursive function that defines a list. If we try the function at the terminal by typing:

nats ( 10 ) ;
[ 10,9,8,7,6,5,4,3,2,1 ] : list ( num )

we can see that the numbers are in descending order because that's the way we arranged them in the list, and not because they were defined in that order. The values in the expression defining the list are treated as though they were all generated at the same time. On the ALICE machine they actually are generated at the same time.

To get the results of a Hope program in the right order, we must put them in the right place in the final data structure. If we want the list of the numbers n through 1 in the opposite order we can't write the definition as:

...  else nats ( n-1 ) :: n ;

because the argument types for :: are the wrong way round. We need to use a another built-in operation <> (read as append) that concatenates two lists. The definition will then look like this:

--- nats ( n ) <= if n = 0 then nil else nats ( n-1 ) <> [ n ] ;

We put n in brackets to make it into a (single-item) list because <> expects both its arguments to be lists. We could also have written ( n :: nil ) instead of [ n ].

Data structures as parameters

Suppose we have a list of integers and we want to write a function to add up all its elements. The declaration will look like this:

dec sumlist : list ( num ) -> num ;

We need to refer to the individual elements of the actual parameter in the equations defining sumlist. We do this using an equation whose left-hand side looks like this:

--- sumlist ( x :: y ) ...

This is an expression involving list constructors and corresponds to an actual parameter that's a list. x and y are formal parameters, but they now name individual parts of the actual parameter value. In an application of sumlist like:

sumlist ( [ 1, 2, 3 ] )

the actual parameter will be dismantled so that x names the value 1 and y names the value [ 2, 3 ]. The complete equation will be:

--- sumlist ( x :: y ) <= x + sumlist ( y ) ;

Notice there's no base case test. As we might expect, it's the empty list, but we can't test for it directly in the equation because there's no formal parameter that refers to the whole list. In fact, if we write the application:

sumlist ( nil )

we'll get an error message because we can't dismantle nil to find the values of x and y. We must cover this case separately using a second recursion equation:

--- sumlist ( nil ) <= 0 ;

The two equations can be given in either order. When sumlist is applied, the actual parameter is examined to see which constructor function was used to define it. If the actual parameter is a non-empty list, the first equation is used, because non-empty lists are defined using the :: constructor. The first number in the list gets named x and the remaining list y. If the actual parameter is the empty list, the second equation is be used because empty lists are defined using the constructor nil.

Pattern-matching

An expression composed of constructors appearing on the left-hand side of a recursion equation is called a pattern. Selecting the right recursion equation and dismantling the actual parameter to name its parts is called pattern-matching. When you write a function, you must give a recursion equation for each possible constructor defining the argument type.

Sometimes we don't need to dismantle the actual parameter, and we can use a formal parameter in the pattern that matches the whole object, irrespective of what constructors were used to define it. As an example, let's see how we could define our own function to concatenate two lists like the built-in operation <>:

infix cat : 4 ;
dec cat : list( num ) # list( num ) -> list ( num ) ;
--- ( h :: t ) cat l <= h :: ( t cat l ) ;
--- nil cat l <= l ;

The first list parameter is matched by the pattern ( h :: t ) so that its first item (the head) and the remaining list (the tail) can be referred to separately on the right-hand side. The second recursion equation covers the case when the first list is empty. The second list parameter is matched by the pattern l whether it's empty or not.

As well as writing enough recursion equations to satisfy all the parameter constructors, we must also be careful not to write sets of equations where more than one pattern might match the actual parameters, because that would be ambiguous.

We can write patterns to match arguments that are tuples in the same way using the tuple constructor ,. When we wrote mult ( x, y ) you probably thought the parentheses and the comma were something to do with the function application. In fact, we were constructing a tuple and the parentheses were only needed because , has a low priority. Hope treats all functions as having only one argument. This can be a tuple when you want the effect of several arguments. Without parentheses,

mult x, y

would be interpreted as:

( mult ( x ), y )

A recursion equation with the left-hand side:

--- mult ( x, y ) <= ...

is just a pattern-match on a tuple. The first item in the tuple gets named x and the second one y.

We can also use pattern-matching on num parameters. These are defined by two constructors called succ and 0. succ defines a number in terms of the next lower one. 0 has no arguments and defines the value zero. Surely 0 is a value, not a function? Well, we're already used to thinking of function applications as another way of writing values, so it's quite consistent to think of 0 as a function application. Here's a version of mult that uses pattern-matching to identify the base case:

infix mult : 8 ;
dec mult : num # num -> num ;
--- x mult 0          <= 0 ;
--- x mult succ ( y ) <= ( x mult y ) + x ;

We can read succ ( y ) as the successor of some number that we'll call y. Instead of naming the actual parameter y like we did in the original version of mult, we're naming its predecessor.

Simplifying expressions

In Pascal programs we can simplify complex expressions by removing common sub-expressions and evaluating them separately. Instead of:

writeln ( ( x + y ) * ( x + y ) ) ;

we would probably write:

z := x + y ; writeln ( z * z ) ;

which is clearer and more efficient. Hope programs consist only of expressions and it's even more important to simplify them. We do this by using a qualified expression:

let z == x + y in z * z ;

This looks like an assignment, but it isn't. == is read as is defined as and z is local to the expression following the in. If we write something like:

let z == z + 1 in z * z ;

we're actually introducing a new variable z to use in the sub-expression z * z. It hides the original one in the sub-expression z + 1.

There's a second form of qualified expression for people who like to use variables first and define their meanings later. It looks like this:

z * z where z == x + y ;

The result of the qualified expression is the same whether we define it using let or where. x + y is evaluated first, and its value is used in the main expression.

The qualifying expression will often be a function application that defines a data structure. If we want to name part of the structure we can use a pattern on the left-hand side of the == symbol:

dec time12 : num -> num # num ;
--- time12 ( s ) <= ( if h > 12 then h-12 else h, m ) where
               ( h, m, s ) == time24 ( s ) ;

We'll use this construction most often when we write recursive functions that define tuples. Here's a typical example. Suppose we want to form a string of words from a sentence. For simplicity a word is taken to be any sequence of characters, and words are separated in the sentence by any number of blanks. The sentence and a single word will be of type list ( char ) and the final sequence of words a list ( list ( char ) ).

It's fairly straightforward to obtain the first word. Here's a function that does it:

dec firsttry : list ( char ) -> list ( char ) ;
--- firsttry ( nil )    <= nil ;
--- firsttry ( c :: s ) <= if c = ' '
                        then nil
                        else c :: firsttry ( s ) ;

One of the nice features of Hope is that we can type in and print out any kind of value, so it's easy to check out the individual functions of our program separately. If we test firsttry we'll see:

firsttry ( "You may hunt it with forks and Hope" ) ;
"You" : list ( char )

But there's a problem here because we're going to need the rest of the sentence if we're to find the remaining words. We must arrange that that the function returns the remaining list as well as the first word. This is where tuples come in:

dec firstword : list ( char ) -> list ( char ) # list ( char ) ;
--- firstword ( nil )    <= ( nil, nil ) ;
--- firstword ( c :: s ) <= if c = ' '
                         then ( nil, s )
                         else ( ( c :: w, r ) where
                                ( w, r ) == firstword ( s ) ) ;

The qualified expression is parenthesised so it only applies to the expression after the else, otherwise we'll evaluate firstword recursively as long as the sentence is non-empty, even if it starts with a blank. This version of the function produces:

firstword ( "Hope springs eternal ..." ) ;
( "Hope","springs eternal ..." ) : ( list ( char ) # list ( char ) )

We can use this to define a function to split the sentence into a list of its individual words:

dec wordlist : list ( char ) -> list ( list ( char ) ) ;
--- wordlist ( nil )    <= nil ;
--- wordlist ( c :: s ) <= if c = ' '
                        then wordlist( s )
                        else ( w :: wordlist ( r ) where
                             ( w, r ) == firstword ( c :: s ) ) ;

which we can test by typing an application at the terminal:

wordlist ( "  While there's life there's Hope  " ) ;
[ "While","there's","life","there's","Hope" ] : list ( list ( char ) )

Review

So far we've concentrated on features of Hope that have something in common with traditional languages such as Pascal, but without many of their limitations, such as fixed-size data structures. We've also been introduced to the functional style of programming where programs are no longer recipes for action, but just definitions of data objects.

Now we'll introduce features of Hope that lift it onto a much higher level of expressive power, and let us write programs that are not only extremely powerful and concise, but that can be checked for correctness at compile-time and mechanically transformed into more efficient versions.