Making functions more powerful

Roger Bailey <rb@doc.ic.ac.uk>
  1. Introducing polymorphic functions
  2. Defining your own data types
  3. Making data more abstract

Introducing polymorphic functions

The Hope compiler can spot many common kinds of error by checking the types of all objects in expressions. This is harder than checking at run-time, but more efficient and saves the embarrassment of discovering an error at run-time in a rarely-executed branch of the air traffic control system we just wrote.

However, strict type-checking can be a nuisance if we want to perform some operation that doesn't depend on the type of the data. Try writing a Pascal procedure to reverse an array of either 10 integers or 10 characters and you'll see what I mean.

Hope avoids this kind of restriction by allowing a function to operate on more than one type of object. We've already used the standard constructors :: and nil to define a list ( num ), a list ( char ) and a list ( list ( char ) ). The standard equality function = will compare any two objects of the same type. Functions with this property are called polymorphic. Pascal's built-in functions abs and sqr and operations like > and = are polymorphic in a primitive kind of way.

We can define our own polymorphic functions in Hope. The function cat we defined above will concatenate lists of numbers, but we can use it for lists containing any type of object. To do this we first declare a kind of universal type called a type variable. We use this in the declaration of cat where it stands for any actual type:

typevar alpha ;
infix cat : 8 ;
dec cat : list ( alpha ) # list ( alpha ) -> list ( alpha ) ;

This says that cat has two parameters that are lists and defines a list, but it doesn't say what kind of object is in the list. However, alpha always stands for the same type throughout a given declaration, so all the lists must contain the same type of object. The expressions:

[ 1,2,3 ]  cat  [ 4,5,6 ]

and

"123"  cat  "456"

are correctly-typed applications of cat and define a list ( num ) and a list ( char ) respectively, while the expression:

[ 1,2,3 ]  cat  "456"

isn't because alpha can't be interpreted as two different types. The interpretation of a type variable is local to a declaration so it can have different interpretations in other declarations without confusion.

Of course it only makes sense for a function to be polymorphic as long as the equations defining it don't make any assumptions about types. In the case of cat the definition uses only :: and nil, which are polymorphic themselves. However, a function like sumlist uses + and can only be used with lists of numbers as parameters.

Defining your own data types

Tuples and lists are quite powerful, but for more sophisticated applications, we'll need to define our own types. User-defined types make programs clearer and help the type-checker to help the programmer. We introduce a new data type in a data declaration:

data vague == yes ++ no ++ maybe ;

data is a reserved word and vague is the name of the new type. == is read as is defined as and ++ is read as or. yes, no and maybe are the names for the constructor functions of the new type. We can now write function definitions that use these constructors in patterns:

dec evade : vague -> vague ;
--- evade ( yes )   <= maybe ;
--- evade ( maybe ) <= no ;

The constructors can be parameterised with any type of object, including the type that's being defined. We can define types like lists, whose objects are of unlimited size using this kind of recursive definition. As an example, here's a user-defined binary tree that can contain numbers as its leaves:

data tree == empty ++ tip ( num ) ++ node ( tree # tree ) ;

There are three constructors. empty has no parameters and defines a tree with nothing in it. tip defines a tree in terms of a single num, and node defines a tree in terms of two other trees. Here's a typical tree:

                                 .
                                / \
                               /   \
                           ___/     \___
                  ________/             \________
                 /       /               \       \
                /       /\               /\       \
               1       /  \             /  \       \
                      /    \           /    \       5
                     2      3         .      4

Here's an example of a function that manipulates trees. It returns the sum of all the numbers contained in one:

dec sumtree : tree -> num ;
--- sumtree ( empty )         <= 0 ;
--- sumtree ( tip ( n ) )     <= n ;
--- sumtree ( node ( l, r ) ) <= sumtree ( l ) + sumtree ( r ) ;

Unfortunately there's no shorthand for writing tree constants like there is for list constants, so we've got to write them out the long way using constructors. If we want to use sumtree to add up all the numbers in the example tree above, we must type in the expression:

sumtree ( node ( node ( tip ( 1 ),
                   node ( tip ( 2 ),
                          tip ( 3 ) ) ),
            node ( node ( empty,
                          tip ( 4 ) ),
                   tip ( 5 ) ) ) ) ;

This isn't really a drawback, because programs that manipulate complex data structures like trees will generally define them using other functions. However, it's very useful to be able to type any kind of constant data structure at the terminal when we're checking out an individual function like sumtree. When we want to test a Pascal program piecemeal, we usually have to write elaborate test harnesses or stubs to generate test data.

Making data more abstract

The identifier list isn't really a Hope data type. It's called a type constructor and must be parameterised with an actual type before it represents one. We did this every time we declared a list ( num ) or a list ( char ). The parameter can also be a user-defined type, as with a list ( tree ) or even a type variable as in list ( alpha ), which defines a polymorphic data type. Constructing new data types like this is a compile-time operation by the way, and you shouldn't confuse it with constructing new data values, which is a run-time operation.

You can define your own polymorphic data types. Here's a version of the binary tree we defined earlier that can have any type of value in its leaves:

data tree ( alpha ) == empty ++
                  tip ( alpha ) ++
                  node ( tree ( alpha ) # tree ( alpha ) ) ;

Once again, alpha is taken to be the same type throughout one instance of a tree. If it's a number, then all references to tree ( alpha ) are taken as references to tree ( num ).

We can define polymorphic functions that operate on trees containing any type of object, because tree constructors are now polymorphic. Here's a function to flatten a binary tree into a list of the same type of object:

dec flatten : tree ( alpha ) -> list ( alpha ) ;
--- flatten ( empty )         <= nil ;
--- flatten ( tip ( x ) )     <= x :: nil ;
--- flatten ( node ( x, y ) ) <= flatten ( x ) <> flatten ( y ) ;

We can demonstrate it on various kinds of tree:

flatten( node ( tip ( 1 ), node ( tip ( 2 ), tip ( 3 ) ) ) ) ;
[ 1, 2, 3 ] : list ( num )
flatten( node ( tip ( "one" ),
           node ( tip ( "two" ),
                  tip ( "three" ) ) ) ) ;
[ "one","two","three" ] : list ( list ( char ) )
flatten( node ( tip ( tip ( 'a' ) ),
           node ( tip ( empty ),
                  tip ( node ( tip ( 'c' ),
                               empty ) ) ) ) ) ;
     [ tip ( 'a' ),empty,node ( tip ( 'c' ), empty) ] : list ( tree ( char ) )

Notice how the type-checker may need to go through several levels of data types to deduce the type of the result.