Functions as data

Roger Bailey <rb@doc.ic.ac.uk>
  1. Even more concise programs
  2. Common patterns of recursion
  3. Anonymous functions
  4. Functions that create new functions

Even more concise programs

The importance of polymorphic types and functions is that they let us write shorter, clearer programs. It's similar to the way Pascal procedures let us use the same code to operate on different data values, but much more powerful. We can write a single Hope function to reverse a list of numbers or characters, where we'd need to write two identical Pascal subroutines to reverse an array of integers and an array of characters.

We can use polymorphic functions whenever we're concerned only with the arrangement of the objects in a data structure and not with their values. Sometimes, we'll want to apply some function to the primitive data items in the structure. Here's a function that uses a second function square to define a list (num) whose elements are the squares of another list (num):

     dec square : num -> num ;
     --- square ( n ) <= n * n ;
     dec squarelist : list ( num ) -> list ( num ) ;
     --- squarelist ( nil )    <= nil ;
     --- squarelist ( n :: l ) <= square ( n ) :: squarelist ( l ) ;

Every time we write a function to process every element of a list, we'll write something almost identical to squarelist. Here's a function to define a list of factorials:

     dec fact : num -> num ;
     --- fact ( 0 )          <= 1 ;
     --- fact ( succ ( n ) ) <= succ ( n ) * fact ( n ) ;
     dec factlist : list ( num ) -> list ( num ) ;
     --- factlist ( nil )    <= nil ;
     --- factlist ( n :: l ) <= fact ( n ) :: factlist ( l ) ;

factlist has exactly the same shape as squarelist, it just applies fact instead of square and then applies itself recursively. Values that differ between applications are usually supplied as actual parameters. Hope treats functions as data objects, so we can do this in a perfectly natural way. A function that can take another function as an actual parameter is called a higher-order function. When we declare it we must give the type of formal parameter standing for the function in the usual way. The declaration of fact tells us that it's:

     num -> num

Read this as a function mapping numbers to numbers. Now let's see how we can use this idea to write factlist and squarelist as a single higher-order function. The new function needs two parameters, the original list, and the function that's applied inside it. Its declaration will be:

     dec alllist : list ( num ) # ( num -> num ) -> list ( num ) ;

The shape of alllist will be the same as factlist and squarelist, but the function we apply to each element of the list will be the formal parameter:

     --- alllist ( nil, f )    <= nil ;
     --- alllist ( n :: l, f ) <= f ( n ) :: alllist ( l, f ) ;

We use alllist like this:

     alllist ( [ 2,4,6 ], square ) ;
     [ 4,16,36 ] : list ( num )
     alllist ( [ 2,4,6 ], fact ) ;
     [ 2,24,720 ] : list ( num )

Notice that there's no argument list after square or fact in the application of alllist, so this construction won't be confused with functional composition. fact( 3 ) represents a function application, but fact by itself represents the unevaluated function.

Higher-order functions can also be polymorphic. We can use this idea to write a more powerful version of alllist that will apply an arbitrary function to every element of a list of objects of arbitrary type. This version of the function is usually known as map:

     typevar alpha, beta ;
     dec map : list ( alpha ) # ( alpha -> beta ) -> list ( beta ) ;
     --- map ( nil, f ) <= nil ;
     --- map ( n :: l, f ) <= f ( n ) :: map ( l, f ) ;

The definition now uses two type variables alpha and beta. Each one represents the same actual type throughout one instance of map, but the two types can be different. This means we can use any function that maps alphas to betas to generate a list of betas from any list of alphas.

The actual types aren't restricted to scalars, which makes map rather more powerful than we might realise at first sight. Suppose we've got a suitably polymorphic function that finds the length of a list:

     typevar gamma ;
     dec length : list ( gamma ) -> num ;
     --- length ( nil )    <= 0 ;
     --- length ( n :: l ) <= 1 + length ( l ) ;
     length ( [ 2,4,6,8 ] ) + length ( "cat" ) ;
     7 : num

We can use map to apply length to every element of a list of words defined by wordlist:

     map ( wordlist ( "The form remains, the function never dies" ), length ) ;
     [ 3,4,8,3,8,5,4 ] : list ( num )

In this example alpha is taken to be a list ( char ) and beta to be a num, so the type of the function must be ( list ( char ) -> num ). length fits the bill if gamma is taken to be a character.

Common patterns of recursion

map is powerful because it sums up a pattern of recursion that turns up frequently in Hope programs. We can see another common pattern in the function length used above. Here's another example of the same pattern:

     dec sum : list ( num ) -> num ;
     --- sum ( nil )    <= 0 ;
     --- sum ( n :: l ) <= n + sum ( l ) ;

The underlying pattern consists of processing each element in the list and accumulating a single value that forms the result. In sum, each element contributes its value to the final result. In length the contribution is always 1 irrespective of the type or value of the element, but the pattern is identical. Functions that display this pattern are of type:

     ( list ( alpha ) -> beta )

In the function definition, the equation for a non-empty list parameter will specify an operation whose result is a beta. This is + in the case of length and sum. One argument of the operation will be a list element and the other will be defined by a recursive call, so the type of the operation needs to be:

     ( alpha # beta -> beta )

This operation differs between applications, so it will be supplied as a parameter. Finally we need a parameter of type beta to specify the base case result. The final version of the function is usually known as reduce. In the following definition, the symbol ! introduces a comment, which finishes with another ! or with a newline:

     dec reduce :   list ( alpha ) #            ! the input list          !
                    ( alpha # beta -> beta ) #  ! the reduction operation !
                    beta                        ! the base case result    !
                ->  beta ;                      ! the result type         !
     --- reduce ( nil, f, b )    <= b ;
     --- reduce ( n :: l, f, b ) <= f ( n, reduce ( l, f, b ) ) ;

To use reduce as a replacement for sum we'll need to supply the standard function + as an actual parameter. We can do this if we prefix it with the symbol nonop to tell the compiler we don't want to use it as an infix operator:

     reduce ( [ 1,2,3 ], nonop +, 0 ) ;
     6 : num

When we use reduce as a replacement for length, we're not interested in the first argument of the reduction operation because we always add 1 whatever the list element is. This function ignores its first argument:

     dec addone : alpha # num -> num ;
     --- addone ( _ , n ) <= n + 1 ;

We use _ to represent any argument we don't want to refer to.

     reduce ( "a map they could all understand", addone, 0 ) ;
     31 : num

Like map, reduce is much more powerful than it first appears because the reduction function needn't define a scalar. Here's a candidate that inserts an object into an ordered list of the same kind of object:

     dec insert : alpha # list ( alpha ) -> list ( alpha ) ;
     --- insert ( i, nil )    <= i :: nil ;
     --- insert ( i, h :: t ) <= if i < h
                              then i :: ( h :: t )
                              else h :: insert ( i, t ) ;

Actually this isn't strictly polymorphic as its declaration suggests, because it uses the built-in function <, which is only defined over numbers and characters, but it shows the kind of thing we can do. When we use it to reduce a list of characters:

     reduce ( "All sorts and conditions of men", insert, nil ) ;
     "     Aacddefiillmnnnnoooorssstt" : list ( char )

we can see that it actually sorts them. The sorting method (insertion sort) isn't very efficient, but the example shows something of the power of higher-order functions and of reduce in particular. It's even possible to use reduce to get the effect of map, but that's left as an exercise for the reader as they say.

Of course map and reduce only work on list ( alpha ) and we'll need to provide different versions for our own structured data types. This is the preferred style of Hope programming, because it makes programs largely independent of the shape of the data structures they use. Here's an alternative kind of binary tree that holds data at its nodes rather than its tips, and a reduce function for it:

     data tree ( alpha ) == empty ++
                       node ( tree ( alpha ) # alpha # tree ( alpha ) ) ;
     dec redtree : tree ( alpha ) #
              ( alpha # beta -> beta ) #
              beta -> beta ;
     --- redtree ( empty, f, b )            <= b ;
     --- redtree ( node ( l, v, r ), f, b ) <=
              redtree ( l, f, f ( v, redtree ( r, f, b ) ) ) ;

We can use this kind of tree to define a more efficient sort. An ordered binary tree has the property that all the objects in its left subtree logically precede the node object and all those in its right subtree are equal to the node object or logically succeed it. We can build an ordered tree if we have a function to insert new objects into an already-ordered tree, such as:

     dec instree : alpha # tree ( alpha ) -> tree ( alpha ) ;
     --- instree ( i, empty ) <= node ( empty, i, empty ) ;
     --- instree ( i, node ( l, v, r ) ) <=
              if i < v
               then node ( instree ( i, l ), v, r )
               else node ( l, v, instree ( i, r ) ) ;

We can sort a list by converting its elements into an ordered tree using instree, then flattening the tree back into a list. This is very easy to specify using the two kinds of reduction we've defined:

     dec sort : list ( alpha ) -> list ( alpha ) ;
     --- sort ( l ) <= redtree( reduce ( l, instree, empty ), nonop ::, nil ) ;
     sort ( "Mad dogs and Englishmen" ) ;
     "   EMaadddegghilmnnnoss" : list ( char )

Anonymous functions

When we used map and reduce, we had to define extra functions like fact and square to pass in as parameters. This is a nuisance if we don't need them anywhere else in the program and especially if they're trivial, like sum or addone. For on-the-spot use in cases like this, we can use an anonymous function called a lambda-expression. Here's a lambda-expression corresponding to sum:

     lambda ( x, y ) => x + y

The symbol lambda introduces the function and x and y are its formal parameters. The expression x + y is the function definition. The part after lambda is just a recursion equation without the -- and with => instead of <=. Here's another lambda-expression used as the actual parameter of reduce:

     reduce ( [ "toe","tac","tic" ], lambda ( a, b ) => b <> a, nil ) ;
     "tictactoe" : list ( char )

There can be more than one recursion equation in the lambda-expression. They're separated from each other by the symbol | and pattern-matching is used to select the appropriate one. Here's an example that uses pattern-matching in a lambda-expression to avoid division by zero when the function it defines is executed:

     map ( [ 1,0,2,0,3 ], lambda ( 0 )          => 0
                          | ( succ ( n ) ) => 100 div succ ( n ) ) ;
     [ 100,0,50,0,33 ] : list ( num )

Functions that create new functions

As we've seen, Hope functions possess full rights and can be passed as actual parameters like any data object. It'll be no surprise to learn that we can return a function as the result of another function. The result can be a named function or an anonymous function defined by a lambda-expression. Here's a simple example:

     dec makestep : num -> ( num -> num ) ;
     --- makestep ( i ) <= lambda ( x ) => i + x ;
     makestep ( 3 ) ;
     lambda ( x ) => 3 + x : num -> num

As we can see from trying makestep, its result is an anonymous function that adds a fixed quantity to its single argument. The size of the increment was specified as an actual parameter to makestep when the new function was created, and has become bound in to its definition. If we try the new function, we'll see that it really does add 3 to its actual parameter:

     makestep ( 3 ) ( 10 ) ;
     13 : num

There are two applications here. First we apply makestep to 3, then the resulting anonymous function is applied to 10. Finally, here's a function that has functions as both actual parameter and result:

     dec twice : ( alpha -> alpha ) -> ( alpha -> alpha ) ;
     --- twice ( f ) <= lambda ( x ) => f ( f ( x ) ) ;

twice defines a new function that has a single argument and some other function f bound into its definition. The new function has the same type as f. We can see its effect using a simple function like square:

     twice ( square ) ;
     lambda ( x ) => square( square ( x ) ) : num -> num
     twice ( square ) ( 3 ) ;
     81 : num

The new function applies the bound-in function to its argument twice. We can even bind in twice itself, generating a new function that behaves like twice except that the function eventually bound in will be applied four times:

     twice ( twice ) ;
     lambda ( x ) => twice ( twice ( x ) ) :
          ( alpha -> alpha ) -> ( alpha -> alpha )
     twice ( twice ) ( square ) ( 3 ) ;
     43046721 : num