This was written as an odd-moments activity by John Sturdy at Harlequin Ltd (john@cb1.com); the copyright belongs to Harlequin. You may pass copies around, so long as you keep this copyright notice intact.
This file was originally written as a heavily-commented lisp file; although you may find this marked-up version easier for reading, I suggest you get a copy of the plain lisp file to experiment with.
It is written to run on GNUemacs, although it should work on almost any lisp system. The top of the plain file includes instructions for loading it into GNUemacs.
This is the first of a series of files to help you to learn about programming language theory and practice. It uses a dialect of the language Lisp (the name is short for LISt Processor). It assumes that you can use GNUemacs, or a similar editing system. In particular, you will need to be able to move the cursor around the file, and use the command M-C-x (ie Meta-Control-X, which on some terminals is Escape Control-X), which makes the Lisp system read the piece of Lisp that the cursor is on.
The first thing to note is than any text to the right of a semicolon on each line is ignored by Lisp, and so can be used for comments about what the program is doing and how.
As you will notice in reading this file, there are conventions
for how many semicolons to put: one for a comment to the right of some
Lisp;
two for a comment aligned with outline of the Lisp
formatting;;
three for a comment in the
margin;;;
and four for a header in comments in the
margin;;;;
These are only conventions; they make no
difference to the Lisp system, but Emacs repositions comments
according to these conventions if given the chance, and it helps other
people to read your programs.
The next thing to note is that Lisp text is free-format; line endings do not matter, so long as they do not come in the middle of a name. The only meaning line endings have is that they end comments.
Lisp programming systems are usually interactive, working as a dialogue with the user. The form of dialogue used in this teaching system is that to send a piece of Lisp to the system, you place the cursor anywhere within that piece of text, and type M-C-x. The result is displayed at the bottom of the Emacs screen. In an uncustomized Emacs, this area is only one line high, and is cleared when you type something else. The file show-eval.el, provided with these teaching files, modifies your emacs to show the results in a second window on your Emacs screen.
The basic unit of Lisp text is called an S-expression (for Symbolic Expression), or just and expression for short. Roughly speaking, in terms of natural languages, an expression could be a word, phrase, sentence, or paragraph.
The process by which Lisp replies to the expressions you give it is called evaluation, and you say that an expression ``evaluates to'' its result (which is also an expression).
The simplest kind of expression is a number. A number evaluates to itself. Get the Lisp system to evaluate the following number (do this by typing meta-control-x, which means either holding down meta and control and pressing x, or, if you don't have a meta key, pressing Escape then control-x):
42
Another simple kind of expression is a string. Try this one:
"A piece of string."
More complex expressions can be built up from the simple ones (and also from complex ones). A compound expression is a list of expressions, written with round brackets (parentheses) at the ends of it. The first thing in the list (we call this the `operator' of the expression) indicates some action to take, and the rest of the list indicates what to do it to. For example, to add 3 and 4, we evaluate (go on, try it -- always try the examples from here on, even if not prompted to like this):
(+ 3 4)
We can build more complex expressions by combining complex expressions just as we combine simple ones. These are evaluated starting from the innermost ones in the nesting and working outwards. For example, this:
(* (+ 3 4) 5)will first evaluate
(+ 3 4)
giving 7
, then
evaluate the outer expression, which is now the same as (* 7
5)
, giving 35
.
We can store any lisp value in a `variable', which is a named thing
for holding lisp values. A name can contain almost anything other than
spaces, and must not start with a number. Valid variable names
include +
-
my-variable
your-variable-1
your-variable-2
etc. There
is a kind of expression for storing a value into a variable, using the
operator setq
. For example, this:
(setq my-variable 12)stores the value
12
in my-variable
.
To use the value in a variable, we use the variable name itself as an expression in its own right:
my-variableand of course we can use this as part of another expression:
(+ my-variable 2)
As well as numbers, strings and names, lisp can handle lists of values as another kind of value (hence its name, a contraction of LISt Processor). We write the lists as series of values in brackets. The lists must be `quoted' by preceding them with a single quote mark -- otherwise they would be evaluated!
For example the expression
'(+ 3 4)evaluates to a list of which the first element is
+
, the
second element is 3
, and the third element is 4
; whereas
(+ 3 4)evaluates to the number
7
... since it is a Lisp
expression (or program) to add 3
and 4
. This
is one of the things that gives Lisp so much flexibility and power:
programs and data structures are made of the same basic stuff! A
function is just a list; you can put functions together on the fly
with list construction operators, and then run (evaluate) them; or for
that matter, take apart functions using the list destructuring
operators.
The quote mechanism is one of Lisp's few pieces of "syntactic
sugar" (syntactical devices that are not strictly necessary); the
form 'a
is actually a more convenient way of entering (quote a)
,
which you will sometimes see in results printed back at you by the
Lisp system.
The quote
operator, as see above, is one of Lisp's "special
forms". All Lisp operators other than special forms evaluate all
their arguments before the operator itself does its work. The
special form quote
for example returns its one argument
unevaluated.
Other special forms use unevaluated arguments, or specific control
of evaluation of arguments, too. One we have already seen is setq
.
Observe that if it were an ordinary operator, it would evaluate
the name of the variable it is to set, thus looking up its old
value and using that as the name. In fact there is an operator
that does that; it is called set. To use it to set a variable, you
must quote the name of the variable, for example
(set 'my-variable 3)which is equivalent to
(set (quote my-variable) 3)or
(setq my-variable 3)Instead of the quote mechanism, we can use the operator
list
to
create lists, for example
(list '+ 3 4)makes the same list as the
'(+ 3 4)
above. Note that we now
have to quote the +
, to stop it being evaluated as a variable
name -- it is a valid variable name, but we have not
established any variable of that name.
We can take lists apart using the operators car
and cdr
.
(These names come from the early days of Lisp, when they stood
for Contents of Address Register and Contents of Decrement
Register -- references to registers of the machine on which
Lisp was first implemented.) car
returns the head (first
element) of a list, and cdr
returns the tail (the remaining
elements). For an example, evaluate:
(car '(+ 3 4))and
(cdr '(+ 3 4))Lists are constructed from building blocks called `conses' or `cons cells' -- the name is an abbreviation of `construct'. Each cons is a pair of memory locations, the first one storing the car of that list building block, and the second one storing the cdr.
We can draw the list shown above, using boxes to represent memory locations, like this:
+---+---+ +---+---+ +---+---+ | * | *-+--->| 3 | *-+--->| 4 |nil| +-+-+---+ +---+---+ +---+---+ | | V ,---. |'+ | `---'Note that each location can contain either a number, or a pointer to another location, or
nil
, a special value which marks the
end of a list, and is also used for the boolean value `false'.
A cons by itself is not printed as a normal list, but with a
`.
' separating the two parts -- unless the cdr is
nil
, in which case no `.
' and no cdr are
printed. Thus, the value
+---+---+ | 1 | 2 | +---+---+is printed as
(1 . 2)
, and
+---+---+ | 1 |nil| +---+---+as
(1)
-- so if we make a cons of which the cdr is
another list, we are back to the ordinary printed form of lists, so
+---+---+ +---+---+ | 1 | *-+-->| 2 |nil| +---+---+ +---+---+is printed as
(1 2)
We can make a new cons either as part of a list, or with the
operator cons
, which takes two arguments, the first of which becomes
the car of the result, and the second of which becomes its
cdr. Evaluate these expressions to see how lists are constructed:
(cons 1 2) (cons 1 nil) (cons 1 (cons 2 nil)) (cons 1 (cons 2 3)) (cons 1 (cons 2 (cons 3 nil)))It is possible to build a great variety of data structures, such as trees, using conses. One common one is the association list, or a-list, which is a list of pairs, used as a lookup table. For example, a list associating 5 numbers with their squares would look like this:
+---+---+ +---+---+ +---+---+ +---+---+ +---+---+ | * | *-+--->| * | *-+--->| * | *-+--->| * | *-+--->| * |nil| +-+-+---+ +-+-+---+ +-+-+---+ +-+-+---+ +-+-+---+ | | | | | | | | | | V V V V V +---+---+ +---+---+ +---+---+ +---+---+ +---+---+ | 1 | 1 | | 2 | 4 | | 3 | 9 | | 4 | 16| | 5 | 25| +-+-+---+ +-+-+---+ +-+-+---+ +-+-+---+ +-+-+---+which prints as
((1 . 1) (2 . 4) (3 . 9) (4 . 16) (5 . 25))
These lists can be constructed and examined using the normal
list
and cons
operators:
(setq my-alist (list (cons 1 1) (cons 2 4) (cons 3 9) (cons 4 16) (cons 5 25)))Before trying each of the following expressions, try to work out what it will evaluate to:
(car my-alist) (cdr my-alist) (car (car my-alist)) (cdr (car my-alist)) (car (car (cdr (cdr my-alist)))) (cdr (car (cdr (cdr my-alist))))A function for using an a-list as a lookup table is provided as part of Lisp. It is called
assoc
, and it takes two arguments,
a key to look up, and an a-list to use as a table. It returns the
pair the holds that key, rather than the value paired with the key. If
no pair with that key is found, it returns nil
. (It
returns the pair holding the key, rather than the value, so that a
lookup result value of nil
can be distinguished from the
key not being found.)
(assoc 1 my-alist) (assoc 4 my-alist) (assoc 6 my-alist) (assoc 9 my-alist)In one style of Lisp programming (generally regarded as the purer form), lists are never modified once created. However, functions are provided to do this if you require. They allow the car and cdr of an existing cons to be modified, by
rplaca
and
rplacd
respectively.
(setq v (list 1 2)) (rplaca v 3) (rplacd v 4)A function to apply another function to each element of a list is built into Lisp. It is called
mapcar
, because it maps a
function along each successive car of the list. It takes two
arguments, a function taking one argument, and a list; it returns a
list of the results. We will use the built-in add-one operation,
1+
, as a convenient function of one argument:
(mapcar '1+ '(1 2 3 5 7 11 13 17 19))
mapcar
evidently must use some means of calling a function
that has been passed in as an ordinary lisp value. The means
is the built-in function funcall
, which takes an arbitrary
number of arguments: the first is the function to call, and
the remainder are the arguments to give to the called
function.
(funcall '1+ 3) (funcall '+ 3 5)There is a similar function called
apply
, which takes
just two arguments: the function to call, and a list containing the
arguments to give it:
(apply '+ '(3 7))Actually,
apply
takes an arbitrary number of arguments:
all but the last one are handled as for funcall
, and the
last one is made into a list of further arguments for the called
function:
(apply '+ 1 2 '(3 4))Typically you would use
funcall
where you are calling a
function from a known set of functions in a particular situation --
perhaps where you have picked a function from a table of functions all
of which take the same arguments -- and use apply
where
you are doing general function manipulation, for example inside a
programming language implementation -- even in the Lisp evaluator
itself!
There is another one in this group, similar to apply
,
which it is more natural to use in some places; it is called
eval
, and it takes one argument, a piece of Lisp to
evaluate. The result of eval is the result of the piece of Lisp it
evaluates, of course. As you've been trying the examples in this
file, the Lisp system has been passing them to eval
. In
the next article, we will write two implementations of eval, both of
them in Lisp -- these are called meta-circular evaluators; the first
very exploratory and rather cumbersome, and the second more concise in
the starkness of its simplicity and the grandeur of its power. (Hmm,
delusions again?)
Try the following evaluations, thinking first about what each one will evaluate to before trying it on the Lisp system:
(eval '(+ 3 4)) (eval '(eval '(+ 3 4))) (eval (eval '(+ 3 4))) (eval ''(+ 3 4)) (eval (eval ''(+ 3 4))) (eval '(eval ''(+ 3 4)))So far all the functions we have seen are built into the Lisp system. Lisp makes it possible to add functions, using the special form
defun
. defun
takes three arguments:
(defun sum-squares (a b) (+ (* a a) (* b b)))and we can use it thus:
(sum-squares 2 3) (sum-squares (+ 2 3) (sum-squares 3 4))Note that the arguments to the function you define are always evaluated for you. Therefore, you cannot define things like
if
this way, because there is no way of stopping it
evaluating all its arguments if it is defined using
defun
. A way of defining things that do not evaluate
their arguments will be introduced later in these notes.
Just as there is a function like funcall
(called
apply
) that lets you pass on a series of arguments as a
list, there is a facility in defun
to let you receive
several arguments bundled into a single list. This is the symbol
&rest
that may appear in an argument list in
defun
. It means that the argument named after it is to
take any remaining arguments passed at call time. For example, we
could define a function like this:
(defun thing (a b &rest c) (if a b c))in which the first two arguments with which it is called are passed in as
a
and b
, and any remaining ones are
bundled up as c
. For example, try evaluating each of:
(thing t 1 2 3 4 5) (thing nil 1 2 3 4 5)In more complex function definitions, we will need to make local variables, which we do with the special form
let
. Here is
a contrived small example, returning its one argument raised to the
fourth power.
(defun fourth-power (a) (let ( (a-squared (* a a)) ) (* a-squared a-squared)))A
let
form has two arguments: a list of new variable bindings,
and a body, in which the bindings are valid. Outside the body,
the bindings made in the binding list no longer exist. You
could read it as: Let a-squared be a*a in ..... ;
It is possible to make more than one binding in a let
form:
(defun sum-fourth-powers (a b) (let ( (a-squared (* a a)) (b-squared (* b b)) ) (+ (* a-squared a-squared) (* b-squared b-squared)))) (sum-fourth-powers 2 3)At this point, we can tie function calls, local variable definitions and inline functions together. If you have learnt ML, you will have come across the
fn
construct, and if you have encountered
lambda calculus, you will have seen the same thing as
lambda
. There is a lambda construct in Lisp, introduced
by the symbol lambda
. A lambda form has three parts: the
symbol lambda
, a function argument list, and a function
body. Perhaps the simplest way to explain it is by example:
(funcall (lambda (a b) (+ a b)) 2 3)A
let
construct actually defines an in-line function,
supplies its arguments, and calls it... function arguments and local
variables are really the same thing. Here is how we could re-phrase
one of the examples above in terms of lambda
:
(defun sum-fourth-powers-using-lambda (a b) (funcall (lambda (a-squared b-squared) (+ (* a-squared a-squared) (* b-squared b-squared))) (* a a) (* b b))) (sum-fourth-powers-using-lambda 2 3)Comparing this with the earlier version of the function should make it apparent how a little fine re-arrangement of the most primitive structure can benefit readability for everyday purposes.
In a let
binding list, the bindings are all made at the
same time, and so the higher items in the list cannot be referred to
in the lower ones -- they are not earlier and later bindings. There is
an alternative form, let*
, in which the bindings are made
in sequence, so, for example, we could write an even more contrived
fourth-power function:
(defun fourth-power (a) (let* ((a-squared (* a a)) (a-fourth (* a-squared a-squared))) a-fourth))It may seem as though
let*
will always be more useful;
however there is a particular use for let
. The variables
declared in a let or let* within a function may be visible to
functions it calls; one of these functions may need to re-bind some
variables, referring to definitions made by an outer call to that
function or another one. For example,
(let ((a b) (b a)) (f a b))does make sense; the binding
(b a)
sets b
not to the value of a
made in the preceding line, but to
whatever value of a
was around before the start of that
let
.
Within the body of a let
or let*
, you can
have many expressions, which are evaluated in turn. The result of the
whole let
construct is the result of the last body
expression. This is called an "implicit progn", progn meaning "program
of n sections returning the value of the nth of them". If you need
this functionality without introducing any variables, you don't have
to do a let
with an empty binding list; there is a
progn
operator:
(progn (set 'a 'b) (set a 3) (1+ b))A
defun
body is also an implicit progn
.
Lisp provides some logical operators and the constants for TRUE, which
is represented by the constant t
, and FALSE which is
represented by nil
(which is also the list terminator,
see above). The logical operators are and
,
or
, not
.
One way of producing truth values is comparing values:
(= 3 (+ 1 2)) (<= 4 5) (string= "abc" "abc")There are several kinds of equality test, such as
=
(for
numbers only) and string=
(for strings
only). equal
is the most general such comparison. If it
is given a list, it compares the elements of the list:
(equal '(a (1 . 2)) '(a (1 . 2)))
eq
compares for lists being the very same object in
memory, rather than for the same content:
(eq '(a (1 . 2)) '(a (1 . 2))) (setq thing '(a (1 . 2))) (eq thing thing)Another way of producing truth values is use of the type predicates, the names of which conventionally end in
p
(for
predicate):
(integerp 1) (stringp "one") (integerp "one") (stringp 1) (symbolp 'one) (symbolp 1) (consp '(a b c))Truth values can be used in two forms of conditional special form,
if
and cond
. if
has two or
three arguments: a condition expression, a `then' expression to
evaluate if the condition holds, and optional an `else' expression for
if it does not hold. (nil
is used for the else
expression if no else expression is given.)
(setq pp '(a . a)) (if (eq (car pp) (cdr pp)) "Paired with itself" "Not paired with itself")Note that were
if
to be an ordinary function rather than
a special form, the last two arguments would have to be quoted to
prevent unwanted evaluations:
(if-as-op (!= a 0) '(/ b a) 1)A more general conditional is
cond
, which provides a
chained if ... then ... else if ... then ..... structure. It takes as
arguments an arbitrary number of two-element lists, in each of which
the first element is a condition expression, and the second a body
expression to evaluate if the associated condition is
true. cond
tries each of its condition-body pairs in turn
from the top down; as soon as it finds a condition that is true, it
evaluates the corresponding body, and returns. It is common to start
the last argument with t
, which, always being true,
provides an explicit default case for the whole cond
form.
(defun type-name (thing) (cond ((eq thing nil) "It is nil") ((eq thing t) "It is t") ((consp thing) "It is a cons") ((integerp thing) "It is a number") ((stringp thing) "It is a string") ((symbolp thing) "It is a symbol") (t "It is some kind of thing that I do not understand"))) (type-name 1) (type-name "one") (type-name '(1 1)) (type-name 'type-name)
We have now covered enough of lisp to construct a simple lisp evaluator in lisp. Such an evaluator is called a meta-circular interpreter. Working ones way through a meta-circular interpreter for lisp is an excellent way to get an in-depth understanding of how the language works, and since lisp is such a simple programming language, is a good introduction to the insides of programming language systems in general.
In general, I would say every computer science graduate should have written at least one programming language implemenation by the time they graduate. Read on, and find that that's not as ambitious a suggestion as it may sound!
[Teaching] John C. G. Sturdy | Last modified: Sun Jun 10 21:39:56 GMT Daylight Time 2007 |