islet.el -- Introduction to Syntax of Lisplet Evaluator for Teaching

This was written as an odd-moments activity by John Sturdy at Harlequin Ltd (john@cb1.com); the copyright belongs to Harlequin. You may pass copies around, so long as you keep this copyright notice intact.

This file was originally written as a heavily-commented lisp file; although you may find this marked-up version easier for reading, I suggest you get a copy of the plain lisp file to experiment with.

It is written to run on GNUemacs, although it should work on almost any lisp system. The top of the plain file includes instructions for loading it into GNUemacs.

Lesson 1

This is the first of a series of files to help you to learn about programming language theory and practice. It uses a dialect of the language Lisp (the name is short for LISt Processor). It assumes that you can use GNUemacs, or a similar editing system. In particular, you will need to be able to move the cursor around the file, and use the command M-C-x (ie Meta-Control-X, which on some terminals is Escape Control-X), which makes the Lisp system read the piece of Lisp that the cursor is on.

The first thing to note is than any text to the right of a semicolon on each line is ignored by Lisp, and so can be used for comments about what the program is doing and how.

As you will notice in reading this file, there are conventions for how many semicolons to put: one for a comment to the right of some Lisp; two for a comment aligned with outline of the Lisp formatting;; three for a comment in the margin;;; and four for a header in comments in the margin;;;; These are only conventions; they make no difference to the Lisp system, but Emacs repositions comments according to these conventions if given the chance, and it helps other people to read your programs.

The next thing to note is that Lisp text is free-format; line endings do not matter, so long as they do not come in the middle of a name. The only meaning line endings have is that they end comments.

Lisp programming systems are usually interactive, working as a dialogue with the user. The form of dialogue used in this teaching system is that to send a piece of Lisp to the system, you place the cursor anywhere within that piece of text, and type M-C-x. The result is displayed at the bottom of the Emacs screen. In an uncustomized Emacs, this area is only one line high, and is cleared when you type something else. The file show-eval.el, provided with these teaching files, modifies your emacs to show the results in a second window on your Emacs screen.

The basic unit of Lisp text is called an S-expression (for Symbolic Expression), or just and expression for short. Roughly speaking, in terms of natural languages, an expression could be a word, phrase, sentence, or paragraph.

The process by which Lisp replies to the expressions you give it is called evaluation, and you say that an expression ``evaluates to'' its result (which is also an expression).

The simplest kind of expression is a number. A number evaluates to itself. Get the Lisp system to evaluate the following number (do this by typing meta-control-x, which means either holding down meta and control and pressing x, or, if you don't have a meta key, pressing Escape then control-x):

Another simple kind of expression is a string. Try this one:

"A piece of string."

More complex expressions can be built up from the simple ones (and also from complex ones). A compound expression is a list of expressions, written with round brackets (parentheses) at the ends of it. The first thing in the list (we call this the `operator' of the expression) indicates some action to take, and the rest of the list indicates what to do it to. For example, to add 3 and 4, we evaluate (go on, try it -- always try the examples from here on, even if not prompted to like this):

(+ 3 4)

We can build more complex expressions by combining complex expressions just as we combine simple ones. These are evaluated starting from the innermost ones in the nesting and working outwards. For example, this:

(* (+ 3 4) 5)

will first evaluate (+ 3 4) giving 7, then evaluate the outer expression, which is now the same as

(* 7
5)

, giving 35.

We can store any lisp value in a `variable', which is a named thing for holding lisp values. A name can contain almost anything other than spaces, and must not start with a number. Valid variable names include + - my-variable your-variable-1 your-variable-2 etc. There is a kind of expression for storing a value into a variable, using the operator setq. For example, this:

(setq my-variable 12)

stores the value 12 in my-variable.

To use the value in a variable, we use the variable name itself as an expression in its own right:

my-variable

and of course we can use this as part of another expression:

(+ my-variable 2)

As well as numbers, strings and names, lisp can handle lists of values as another kind of value (hence its name, a contraction of LISt Processor). We write the lists as series of values in brackets. The lists must be `quoted' by preceding them with a single quote mark -- otherwise they would be evaluated!

For example the expression

'(+ 3 4)

evaluates to a list of which the first element is +, the second element is 3, and the third element is 4; whereas

(+ 3 4)

evaluates to the number 7 ... since it is a Lisp expression (or program) to add 3 and 4. This is one of the things that gives Lisp so much flexibility and power: programs and data structures are made of the same basic stuff! A function is just a list; you can put functions together on the fly with list construction operators, and then run (evaluate) them; or for that matter, take apart functions using the list destructuring operators.

The quote mechanism is one of Lisp's few pieces of "syntactic sugar" (syntactical devices that are not strictly necessary); the form 'a is actually a more convenient way of entering (quote a) , which you will sometimes see in results printed back at you by the Lisp system.

The quote operator, as see above, is one of Lisp's "special forms". All Lisp operators other than special forms evaluate all their arguments before the operator itself does its work. The special form quote for example returns its one argument unevaluated.

Other special forms use unevaluated arguments, or specific control of evaluation of arguments, too. One we have already seen is setq. Observe that if it were an ordinary operator, it would evaluate the name of the variable it is to set, thus looking up its old value and using that as the name. In fact there is an operator that does that; it is called set. To use it to set a variable, you must quote the name of the variable, for example

(set 'my-variable 3)

which is equivalent to

(set (quote my-variable) 3)

(setq my-variable 3)

Instead of the quote mechanism, we can use the operator list to create lists, for example

(list '+ 3 4)

makes the same list as the '(+ 3 4) above. Note that we now have to quote the +, to stop it being evaluated as a variable name -- it is a valid variable name, but we have not established any variable of that name.

We can take lists apart using the operators car and cdr. (These names come from the early days of Lisp, when they stood for Contents of Address Register and Contents of Decrement Register -- references to registers of the machine on which Lisp was first implemented.) car returns the head (first element) of a list, and cdr returns the tail (the remaining elements). For an example, evaluate:

(car '(+ 3 4))

and

(cdr '(+ 3 4))

Lists are constructed from building blocks called `conses' or `cons cells' -- the name is an abbreviation of `construct'. Each cons is a pair of memory locations, the first one storing the car of that list building block, and the second one storing the cdr.

We can draw the list shown above, using boxes to represent memory locations, like this:

+---+---+    +---+---+    +---+---+
| * | *-+--->| 3 | *-+--->| 4 |nil|
+-+-+---+    +---+---+    +---+---+
  |
  |
  V
,---.
|'+ |
`---'

Note that each location can contain either a number, or a pointer to another location, or nil, a special value which marks the end of a list, and is also used for the boolean value `false'.

A cons by itself is not printed as a normal list, but with a `.' separating the two parts -- unless the cdr is nil, in which case no `.' and no cdr are printed. Thus, the value

+---+---+
| 1 | 2 |
+---+---+

is printed as (1 . 2), and

+---+---+
| 1 |nil|
+---+---+

as (1) -- so if we make a cons of which the cdr is another list, we are back to the ordinary printed form of lists, so

+---+---+   +---+---+
| 1 | *-+-->| 2 |nil|
+---+---+   +---+---+

is printed as (1 2)

We can make a new cons either as part of a list, or with the operator cons, which takes two arguments, the first of which becomes the car of the result, and the second of which becomes its cdr. Evaluate these expressions to see how lists are constructed:

(cons 1 2)

(cons 1 nil)

(cons 1 (cons 2 nil))

(cons 1 (cons 2 3))

(cons 1 (cons 2 (cons 3 nil)))

It is possible to build a great variety of data structures, such as trees, using conses. One common one is the association list, or a-list, which is a list of pairs, used as a lookup table. For example, a list associating 5 numbers with their squares would look like this:

+---+---+    +---+---+    +---+---+    +---+---+    +---+---+
| * | *-+--->| * | *-+--->| * | *-+--->| * | *-+--->| * |nil|
+-+-+---+    +-+-+---+    +-+-+---+    +-+-+---+    +-+-+---+
  |            |            |            |            |
  |            |            |            |            |
  V            V            V            V            V
+---+---+    +---+---+    +---+---+    +---+---+    +---+---+
| 1 | 1 |    | 2 | 4 |    | 3 | 9 |    | 4 | 16|    | 5 | 25|
+-+-+---+    +-+-+---+    +-+-+---+    +-+-+---+    +-+-+---+

which prints as ((1 . 1) (2 . 4) (3 . 9) (4 . 16) (5 . 25))

These lists can be constructed and examined using the normal list and cons operators:

(setq my-alist (list (cons 1 1)
                     (cons 2 4)
                     (cons 3 9)
                     (cons 4 16)
                     (cons 5 25)))

Before trying each of the following expressions, try to work out what it will evaluate to:

(car my-alist)

(cdr my-alist)

(car (car my-alist))

(cdr (car my-alist))

(car (car (cdr (cdr my-alist))))

(cdr (car (cdr (cdr my-alist))))

A function for using an a-list as a lookup table is provided as part of Lisp. It is called assoc, and it takes two arguments, a key to look up, and an a-list to use as a table. It returns the pair the holds that key, rather than the value paired with the key. If no pair with that key is found, it returns nil. (It returns the pair holding the key, rather than the value, so that a lookup result value of nil can be distinguished from the key not being found.)

(assoc 1 my-alist)

(assoc 4 my-alist)

(assoc 6 my-alist)

(assoc 9 my-alist)

In one style of Lisp programming (generally regarded as the purer form), lists are never modified once created. However, functions are provided to do this if you require. They allow the car and cdr of an existing cons to be modified, by rplaca and rplacd respectively.

(setq v (list 1 2))

(rplaca v 3)

(rplacd v 4)

A function to apply another function to each element of a list is built into Lisp. It is called mapcar, because it maps a function along each successive car of the list. It takes two arguments, a function taking one argument, and a list; it returns a list of the results. We will use the built-in add-one operation, 1+, as a convenient function of one argument:

(mapcar '1+ '(1 2 3 5 7 11 13 17 19))

mapcar evidently must use some means of calling a function that has been passed in as an ordinary lisp value. The means is the built-in function funcall, which takes an arbitrary number of arguments: the first is the function to call, and the remainder are the arguments to give to the called function.

(funcall '1+ 3)

(funcall '+ 3 5)

There is a similar function called apply, which takes just two arguments: the function to call, and a list containing the arguments to give it:

(apply '+ '(3 7))

Actually, apply takes an arbitrary number of arguments: all but the last one are handled as for funcall, and the last one is made into a list of further arguments for the called function:

(apply '+ 1 2 '(3 4))

Typically you would use funcall where you are calling a function from a known set of functions in a particular situation -- perhaps where you have picked a function from a table of functions all of which take the same arguments -- and use apply where you are doing general function manipulation, for example inside a programming language implementation -- even in the Lisp evaluator itself!

There is another one in this group, similar to apply, which it is more natural to use in some places; it is called eval, and it takes one argument, a piece of Lisp to evaluate. The result of eval is the result of the piece of Lisp it evaluates, of course. As you've been trying the examples in this file, the Lisp system has been passing them to eval. In the next article, we will write two implementations of eval, both of them in Lisp -- these are called meta-circular evaluators; the first very exploratory and rather cumbersome, and the second more concise in the starkness of its simplicity and the grandeur of its power. (Hmm, delusions again?)

Try the following evaluations, thinking first about what each one will evaluate to before trying it on the Lisp system:

(eval '(+ 3 4))

(eval '(eval '(+ 3 4)))

(eval (eval '(+ 3 4)))

(eval ''(+ 3 4))

(eval (eval ''(+ 3 4)))

(eval '(eval ''(+ 3 4)))

So far all the functions we have seen are built into the Lisp system. Lisp makes it possible to add functions, using the special form defun. defun takes three arguments:

the name of the function to define
its argument list
its definition body, as a lisp expression.

So for example, we can define a function to return the sum the squares of its arguments as

(defun sum-squares (a b)
  (+ (* a a) (* b b)))

and we can use it thus:

(sum-squares 2 3)

(sum-squares (+ 2 3) (sum-squares 3 4))

Note that the arguments to the function you define are always evaluated for you. Therefore, you cannot define things like if this way, because there is no way of stopping it evaluating all its arguments if it is defined using defun. A way of defining things that do not evaluate their arguments will be introduced later in these notes.

Just as there is a function like funcall (called apply) that lets you pass on a series of arguments as a list, there is a facility in defun to let you receive several arguments bundled into a single list. This is the symbol &rest that may appear in an argument list in defun. It means that the argument named after it is to take any remaining arguments passed at call time. For example, we could define a function like this:

(defun thing (a b &rest c)
  (if a b c))

in which the first two arguments with which it is called are passed in as a and b, and any remaining ones are bundled up as c. For example, try evaluating each of:

(thing t 1 2 3 4 5)

(thing nil 1 2 3 4 5)

In more complex function definitions, we will need to make local variables, which we do with the special form let. Here is a contrived small example, returning its one argument raised to the fourth power.

(defun fourth-power (a)
  (let (
        (a-squared (* a a))
       )
    (* a-squared a-squared)))

A let form has two arguments: a list of new variable bindings, and a body, in which the bindings are valid. Outside the body, the bindings made in the binding list no longer exist. You could read it as: Let a-squared be a*a in ..... ;

It is possible to make more than one binding in a let form:

(defun sum-fourth-powers (a b)
  (let (
        (a-squared (* a a))
        (b-squared (* b b))
        )
    (+
     (* a-squared a-squared)
     (* b-squared b-squared))))

(sum-fourth-powers 2 3)

At this point, we can tie function calls, local variable definitions and inline functions together. If you have learnt ML, you will have come across the fn construct, and if you have encountered lambda calculus, you will have seen the same thing as lambda. There is a lambda construct in Lisp, introduced by the symbol lambda. A lambda form has three parts: the symbol lambda, a function argument list, and a function body. Perhaps the simplest way to explain it is by example:

(funcall (lambda (a b)   (+ a b))
	 2 3)

A let construct actually defines an in-line function, supplies its arguments, and calls it... function arguments and local variables are really the same thing. Here is how we could re-phrase one of the examples above in terms of lambda:

(defun sum-fourth-powers-using-lambda (a b)
  (funcall (lambda (a-squared b-squared)
	     (+
	      (* a-squared a-squared)
	      (* b-squared b-squared)))
	   (* a a)
	   (* b b)))

(sum-fourth-powers-using-lambda 2 3)

Comparing this with the earlier version of the function should make it apparent how a little fine re-arrangement of the most primitive structure can benefit readability for everyday purposes.

In a let binding list, the bindings are all made at the same time, and so the higher items in the list cannot be referred to in the lower ones -- they are not earlier and later bindings. There is an alternative form, let*, in which the bindings are made in sequence, so, for example, we could write an even more contrived fourth-power function:

(defun fourth-power (a)
  (let* ((a-squared (* a a))
	 (a-fourth (* a-squared a-squared)))
    a-fourth))

It may seem as though let* will always be more useful; however there is a particular use for let. The variables declared in a let or let* within a function may be visible to functions it calls; one of these functions may need to re-bind some variables, referring to definitions made by an outer call to that function or another one. For example,

(let ((a b)
      (b a))
  (f a b))

does make sense; the binding (b a) sets b not to the value of a made in the preceding line, but to whatever value of a was around before the start of that let.

Within the body of a let or let*, you can have many expressions, which are evaluated in turn. The result of the whole let construct is the result of the last body expression. This is called an "implicit progn", progn meaning "program of n sections returning the value of the nth of them". If you need this functionality without introducing any variables, you don't have to do a let with an empty binding list; there is a progn operator:

(progn
   (set 'a 'b)
   (set a 3)
   (1+ b))

A defun body is also an implicit progn.

Lisp provides some logical operators and the constants for TRUE, which is represented by the constant t, and FALSE which is represented by nil (which is also the list terminator, see above). The logical operators are and, or, not.

One way of producing truth values is comparing values:

(= 3 (+ 1 2))

(<= 4 5)

(string= "abc" "abc")

There are several kinds of equality test, such as = (for numbers only) and string= (for strings only). equal is the most general such comparison. If it is given a list, it compares the elements of the list:

(equal '(a (1 . 2)) '(a (1 . 2)))

eq compares for lists being the very same object in memory, rather than for the same content:

(eq '(a (1 . 2)) '(a (1 . 2)))

(setq thing '(a (1 . 2)))

(eq thing thing)

Another way of producing truth values is use of the type predicates, the names of which conventionally end in p (for predicate):

(integerp 1)

(stringp "one")

(integerp "one")

(stringp 1)

(symbolp 'one)

(symbolp 1)

(consp '(a b c))

Truth values can be used in two forms of conditional special form, if and cond. if has two or three arguments: a condition expression, a `then' expression to evaluate if the condition holds, and optional an `else' expression for if it does not hold. (nil is used for the else expression if no else expression is given.)

(setq pp '(a . a))

(if (eq (car pp) (cdr pp))
    "Paired with itself"
  "Not paired with itself")

Note that were if to be an ordinary function rather than a special form, the last two arguments would have to be quoted to prevent unwanted evaluations:

(if-as-op (!= a 0) '(/ b a) 1)

A more general conditional is cond, which provides a chained if ... then ... else if ... then ..... structure. It takes as arguments an arbitrary number of two-element lists, in each of which the first element is a condition expression, and the second a body expression to evaluate if the associated condition is true. cond tries each of its condition-body pairs in turn from the top down; as soon as it finds a condition that is true, it evaluates the corresponding body, and returns. It is common to start the last argument with t, which, always being true, provides an explicit default case for the whole cond form.

(defun type-name (thing)
  (cond
   ((eq thing nil)
    "It is nil")
   ((eq thing t)
    "It is t")
   ((consp thing)
    "It is a cons")
   ((integerp thing)
    "It is a number")
   ((stringp thing)
    "It is a string")
   ((symbolp thing)
    "It is a symbol")
   (t
    "It is some kind of thing that I do not understand")))

(type-name 1)

(type-name "one")

(type-name '(1 1))

(type-name 'type-name)

We have now covered enough of lisp to construct a simple lisp evaluator in lisp. Such an evaluator is called a meta-circular interpreter. Working ones way through a meta-circular interpreter for lisp is an excellent way to get an in-depth understanding of how the language works, and since lisp is such a simple programming language, is a good introduction to the insides of programming language systems in general.

In general, I would say every computer science graduate should have written at least one programming language implemenation by the time they graduate. Read on, and find that that's not as ambitious a suggestion as it may sound!

[Teaching]

John C. G. Sturdy

Last modified: Sun Jun 10 21:39:56 GMT Daylight Time 2007