Lab 8A: carlae
If you are a current student, please Log In for full access to the web site.
Note that this link will take you to an external site (https://oidc.mit.edu) to authenticate, and then you will be redirected back to this page.
Table of Contents
- 1) Timing
- 2) Preparation
- 3) Introduction
- 4) Interpreter Design
- 5) Lexer (Tokenizer)
- 6) Parser
- 7) Evaluator
- 7.1) Evaluator 1: Calculator
- 7.2) Additional Operations
- 7.3) Evaluator 2: Variables
- 7.4) Environments
- 7.5) Evaluator Changes
- 7.6) Evaluator 3: Functions
- 8) Endnotes
- 9) Code Submission
- 10) Checkoff
This lab is part 1 of a two-week lab, which will be completed next week. Due dates for this part are:
- Code submission and concept questions: Monday, 13 November, 10pm
- Checkoff: Friday, 17 November, 5pm
This lab assumes you have Python 3.5 or later installed on your machine.
The following file contains code and other resources as a starting point for this lab: lab8A.zip
Most of your changes should be made to
lab.py, which you will submit
at the end of this lab. Importantly, you should not add any imports
to the file. You may submit portions of the lab late (see the
grading page for more details), but the last day to
submit this lab will be the Friday after the due date.
This lab is worth a total of 4 points. Your score for the lab is based on:
- correctly answering the questions throughout this page (0.5 points)
- passing the test cases from
test.pyunder the time limit (1 points), and
- a brief "checkoff" conversation with a staff member to discuss your code (2.5 points).
Please also review the collaboration policy before continuing. In particular, note that you are not allowed to use any code other than that which you have written yourself, including code you find online.
In this lab, you will implement an interpreter for a dialect of LISP, one of the earliest high-level programming languages (it was invented by John McCarthy at MIT in 1958!). Because this language will in some ways be quite small compared to Python, we'll call it carlae, after Leptotyphlops carlae.
Its syntax is simpler than Python's, and the complete interpreter will fit in a single Python file. However, despite its small size, the language we will implement here will be Turing complete, i.e., in theory, it will be able to solve any possible computational problem (so it would be possible, for example, to write carlae implementations of any of the labs we have done so far in 6.009).
3.1) LISP and carlae
As with most LISP dialects, the syntax of carlae is far simpler than that of Python. All-in-all, we can define the syntax of carlae as follows:
- Numbers (e.g.
1) and symbols (e.g.
x) are called atomic expressions; they cannot be broken into pieces. These are similar to their Python counterparts, except that in carlae, operators such as
>are symbols, too, and are treated the same way as
- Everything else is an S-expression:
an opening round bracket
(, followed by zero or more expressions, followed by a closing round bracket
). The first subexpression determines what it means:
- An S-expression starting with a keyword, e.g.
(if ...), is a special form; the meaning depends on the keyword.
- An S-expression starting with a non-keyword, e.g.
(fn ...), is a function call, where the first element in the expression is the function to be called, and the remaining subexpressions represent the arguments to that function.
- An S-expression starting with a keyword, e.g.
And that's it! The whole syntax is described by the three bullet points above. For example, consider the following definition of a function that computes Fibonacci numbers in Python:
def fib(n): if n <= 1: return n return fib(n-1) + fib(n-2)
We could write an equivalent program in carlae:
(define fib (lambda (n) (if (<= n 1) n (+ (fib (- n 1)) (fib (- n 2))) ) ) )
Using so many parentheses might take some getting used to, but it helps to keep
the language simple and consistent. Some people have joked that LISP stands
for "Lots of Insipid and Silly Parentheses," though some might argue instead
that it stands for "Lisp Is Syntactically Pure."
4) Interpreter Design
Despite its small size, the interpreter for carlae will still be rather complicated. To help manage this complexity, we'll start with a very small language, and we'll gradually add functionality until we have a fully featured language. As with most interpreters, we will think of our carlae interpreter as consisting of three parts:
- A lexer, which takes a string as input and produces a list of tokens, which represent meaningful units in the programming language.
- A parser, which takes the output of the lexer as input and produces a structured representation of the program as its output.
- An evaluator, which takes the output of the parser as input and actually handles running the program.
5) Lexer (Tokenizer)
Our first job is lexing (or tokenizing). In carlae, we'll have exactly
three kinds of tokens: opening round brackets
(, closing round brackets
and everything else (separated by whitespace). Your first task for the lab is
to write a function called
tokenize, which takes a single string representing
a program as its input and outputs a list of tokens. For example, calling
tokenize("(foo (bar 3.14))") should give us the following result:
'foo', '(', 'bar', '3.14', ')', ')']
Unlike in Python, whitespace does not matter, so, for example, the
function should produce exactly the same output for both of the following
(define circle-area (lambda (r) (* 3.14 (* r r)) ) )
(define circle-area (lambda (r) (* 3.14 (* r r))))
tokenize function should also handle comments. A comment in carlae is
prefixed with a semicolon (
;), instead of the octothorpe (
#) used by
Python. If a line contains a semicolon, the
tokenize function should not
consider that semicolon or the characters that follow it on that line to be
part of the input program.
tokenize("(cat (dog (tomato)))")?
;add the numbers 2 and 3 (+ ; this expression 2 ; spans multiple 3 ; lines )
Our next job is parsing the list of tokens into an abstract syntax tree, a
structured representation of the expression to be evaluated. Implement the
parse function in
parse should take a single input (a list of
tokens as produced by
tokenize) and should output a representation of the
- A number is represented as an instance of
- A symbol is represented as a string.
- An S-expression is represented as a list of its parsed subexpressions.
For example, the program above that defined
circle-area should parse as
['define', 'circle-area', ['lambda', ['r'], ['*', 3.14, ['*', 'r', 'r']]]]
Note that the structure of the parser's output is such that a recursive solution is likely the "path of least resistance" (parsing an expression often involves recursively parsing subexpressions).
Your code for
parse should not use Python's built-in
In the case that parentheses are mismatched in the input, the function should
After implementing both
parse, your code should pass the
Test1_Parse suite in
parseon the output from the first concept question above?
parse(['(', '+', '2', '(', '-', '5', '3', ')', '7', '8', ')'])?
"How do you eat a big pizza? One little bite at at time..."
Now that we have the program in a form that is (relatively) easy to work with, we can move on to implementing the evaluator, which will handle actually running the program. This part of the interpreter will get fairly complicated, so we will start small and add in more pieces later. We will make several "false steps" along the way, where we'll need to make modifications to pieces we had implemented earlier.
Because of this, it will be important to save backups of your
after every major modification (or to use a version control system such as
Mercurial or Git), so that if something goes wrong, you can revert to a working
7.1) Evaluator 1: Calculator
We'll hold off on implementing variables, lists, conditionals, and functions
for a little while; for now, we'll start by implementing a small calculator
that can handle the
Note that we have provided a dictionary called
which maps the names
- to functions. Each of these functions takes a
list as an argument and returns the appropriate value.
Define a function
evaluate, which takes as its sole input an expression of
the same form as the parser's output.
evaluate should return the value of
- If the expression is a symbol of a name in
carlae_builtins, it should return the associated object.
- If the expression is a number, it should return that number.
- If the expression is a list (representing an S-expression), each of the elements in the list should be evaluated, and the result of evaluating the first element (a function) should be called with the remaining elements passed in. The overall result of evaluating such a function is the return value of that function call.
- If the expression is a symbol that is not in
carlae_builtins, or is a list whose first element is not a function, it should raise an
evaluate function in
lab.py according to the rules above.
7.1.1) Testing Your Code: REPL
It is kind of a pain to have to type out all of the arguments to
each time we call it.
As such, we'll implement a REPL (a "Read, Evaluate, Print
Loop") for carlae. A REPL has a simple job: it continually prompts the
user for input until they type
QUIT. Until then, it:
- accepts input from the user,
- tokenizes and parses it,
- evaluates it, and
- prints the result.
If an error occurs during any of these steps, an error message should be displayed, and that expression may be ignored, but the program should not exit.
To implement the REPL, we can make use of Python's built-in
input takes an argument representing a prompt to be displayed to the user and
returns the string that they type (it is returned when they hit enter).
The following shows one possible interaction with a REPL, with a particular prompt and output formatting (you are welcome to use whatever formatting you like!):
in> (+ 2 3) out> 5 in> (+ 2 (- 3 4)) out> 1 in> (- 3.14 1.14 1) out> 1.0000000000000004 in> (+ 2 (- 3 4 5)) out> -4 in> QUIT
Implement a REPL for carlae and use it to test your evaluator. The REPL
can/should be one of your main means of testing moving forward; feel free to
try things out using the REPL as you work through the remainder of the lab.
The functionality of your REPL will not be tested automatically; rather, it
will be tested during the checkoff. The REPL should only start when the lab is
run directly, not when
lab.py is imported from another script.
7.2) Additional Operations
Implement two new operations:
*should take arbitrarily-many arguments and should return the product of all its arguments.
/should take arbitrarily many arguments. It should return the result of successively dividing the first argument by the remaining arguments.
After implementing the evaluator and the
/ operations, try them out
in the REPL.
7.3) Evaluator 2: Variables
Next, we will implement our first special form: variable definition using the
A variable definition has the following syntax:
(define NAME EXPR), where
NAME is a symbol and
EXPR is an arbitrary expression. When carlae
define expression, it should associate the name
NAME with the
value that results from evaluating
EXPR. In addition, the
expression should evaluate to the result of evaluating
The following transcript shows an example interaction using the
in> (define pi 3.14) out> 3.14 in> (define radius 2) out> 2 in> (* pi radius radius) out> 12.56 in> QUIT
define differs from the function calls we saw earlier. It is a
special form that does not evaluate the name that follows it; it only
evaluates the expression that follows the name. As such, we will have to treat
it differently from a normal function call.
In addition, in order to think about how to implement
define, we will need to
talk about the notion of environments.
Admitting variable definition into our language means that we need to be, in some sense, more careful with the process by which expressions are evaluated. We will handle the complexity associated with variable definition by maintaining structures called environments. An environment consists of bindings from variable names to values, and possibly a parent environment, from which other bindings are inherited. One can look up a name in an environment, and one can bind names to values in an environment.
The environment is crucial to the evaluation process, because it determines the
context in which an expression should be evaluated. Indeed, one could say that
expressions in a programming language do not, in themselves, have any meaning.
Rather, an expression acquires a meaning only with respect to some environment
in which it is evaluated. Even the interpretation of an expression as
(+ 1 1) depends on an understanding that one is operating
in a context in which + is the symbol for addition. Thus, in our model of
evaluation we will always speak of evaluating an expression with respect to
To describe interactions with the interpreter, we will suppose that there is a
"global" environment, consisting of bindings of the names of built-in functions
and constants to their associated values. For example, the idea that
the symbol for addition is captured by saying that the symbol
+ is bound in
this global environment to the primitive addition procedure we defined above.
One necessary operation on environments is looking up the value to which a given name is bound. To do this, we can follow these steps:
- If the name has a binding in the environment, that value is returned.
- If the name does not have a binding in the environment and the environment has a parent, we look up the name in the parent environment (following these same steps).
- If the name does not have a binding in the environment and the environment
does not have a parent, an
Note that looking up a name in an environment is similar to looking up a key in a dictionary, except that if the key is not found, we continue looking in parent environments until we find the key or we run out of parents to look in.
In order to make variables work properly, you will need to implement the kind of lookup described above in Python. It is up to you to decide how to implement environments and the associated lookups; your implementation will not be tested directly by the automatic checker, but rather will be tested by looking at the end-to-end behavior of your evaluator. Regardless of how you implement environments, you should make sure your environment representation can handle variables with arbitrary names, and you should be prepared to discuss your implementation during the checkoff.
7.4.1) Environments: Example
The following shows an example of an environment structure, where arrows indicate each environment's parent, if any. Here we have four environments, labeled E1, E2, E3, and E4. Both E2 and E3 have E1 as a parent environment, and E4 has E3 as a parent environment. E1 does not have a parent environment.
We say that the values
z are bound in E1. Note that
y are bound in E2;
x is bound in E3, and
a is bound in
E4. Looking up a name, we work our way up the arrows until we find the
name we are looking for. For example, looking up
a in E4 gives us the
1, and looking up
z in E4 gives us
Notice that, as was mentioned above, the evnrionment is crucial for determining the result of an expression.
Also note that the
define keyword always operates directly on the environment
in which it is evaluated. If we were to evaluate
(define y 8) in E4,
this would result in a new binding inside of E4 (without affecting the
Answer the following questions about variable lookup in this updated environment:
7.4.2) Environments: Initial Structure
For purposes of our REPL, we will start by thinking about two main
environments: an environment to hold the built-in values (such as the
function) and a "global" environment where top-level definitions from users'
programs will be bound. For example, running the code below from the REPL
should result in the environment structure shown in the picture that follows:
in> (define x 3) out> 3 in> (define y 4) out> 4
Note that, when we look up any of the built-in variables from the global environment, we end up finding them in the built-ins environment.
7.5) Evaluator Changes
Now, we'll need to add support for variable definition and lookup to our
interpreter by implementing a Python structure for representing an environment
and modifying the
evaluate function so that it handles variables and the
Beyond implementing a representation for environments, we will need to make
four modifications to our
evaluate function. We will need to:
evaluateso that it takes a second (optional) argument: the environment in which the expression should be evaluated. If no environment is passed in, an empty environment should be used, whose parent is an environment containing all the bindings from the
- make sure that
definekeyword properly, evaluating the given expression and storing the result in the environment that was passed to
- modify the way symbols are handled in
evaluate, so that if the symbol exists as a key in the environment (or a parent environment),
evaluatereturns the associated value.
Note that after implementing these changes, you can test your implementation using the examples above in the REPL.
To make automatic checking possible, define a function called
that takes the same arguments as
evaluate but returns a tuple with two
elements: the result of the evaluation and the environment in which the
expression was evaluated. Your code will not pass the tests without this
function. Note that this function should always return the environment in
which the expression was actually evaluated, even if no environment was
explicitly passed to it.
Consider evaluating the following expressions in order from the same REPL (in the same environment). What is the result of evaluating each of these expressions?
(+ 3 2 4 (- 2 7 8))
(define x (+ 2 3))
(define x2 (* x x))
(+ x2 x)
Test this behavior in your REPL to make sure it matches! You may also wish to test some other similar examples in your REPL.
At this point, your code should pass all the tests in the
Test2_Eval suite in
carlae_codedirectory; these files contain the expressions that are evaluated in the associated test case, with one expression on each line. A good exercise if you are failing a test case would be to figure out by hand what each expression should evaluate to and then try them by pasting one line at a time into your REPL. If you are having trouble figuring out why a given expression evaluates the way it does, please don't hesitate to ask for help in office hours or on Piazza!
7.6) Evaluator 3: Functions
So far, we have a pretty nice calculator, but there are a few things missing before we can really call it a programming language. One of those things is user-defined functions.
7.6.1) Defining functions
Currently, the operations we can perform are limited to the functions in the
carlae_builtins dictionary. We can really empower a user of the language by
allowing them to define functions of their own. We will accomplish this via
lambda special form (so called because it is strongly rooted in Church's
lambda expression takes the following form:
(lambda (PARAM1 PARAM2 ...)
EXPR). The result of evaluating such an expression should be an object
representing that function (note that this statement represents a function
definition, not a function call). Importantly, there are a few things we
need to keep track of with regard to functions. We need to store:
- the code representing the body of the function (which, for now, is restricted to a single expression representing the return value)
- the names of the function's parameters
- a pointer to the environment in which the function was defined
Once again, it is up to you to determine how exactly a function should be represented in your interpreter; but it is important that, however it is represented, it stores the information above (and also that you are able to distinguish it from the other syntactic forms we have seen so far).
For example, the result of evaluating
(lambda (x y) (+ x y)) in the global
environment should be an object that stores the following information:
- the function's parameters, in order, are called
- the function's body is the expression
(+ x y).
- the function was defined in the global environment.
7.6.2) Calling Functions
We also need a way to call user-defined functions. When the first element in an S-expression evaluates to a user-defined function, we will need to call the function by taking the following steps (note that these are the same steps that Python takes when calling functions):
- evaluate all of the arguments to the function in the current environment (from which the function is being called)
- make a new environment whose parent is the environment in which the function was defined (this is called lexical scoping)
- in that new environment, bind the function's parameters to the values that are passed to it
- evaluate the body of the function in that new environment
It is worth noting that these steps are very close, if not exactly the same, as the steps Python takes when it calls a function. The readings for 6.s080 contain some additional explanation and examples (see sections 4-8 of Functions and sections 1-2.5 of Functions 2).
Here are two examples of calling functions and using
lambda. The first
associates a name with the function and calls the function using that name,
whereas the second calls the function directly without first giving it a name.
in> (define square (lambda (x) (* x x))) out> function object in> (square 2) out> 4
in> ((lambda (x) (* x x)) 3) out> 9
xin the global environment?
Here is another example of a more complicated function. Note that the result
(foo 3) is a function, which is then called with
2 as an
argument. Note also that the value associated with the name
x when we call
(bar 2) is the
3 from the environment in which that function was defined,
7 from the environment in which it was called.
in> (define x 7) out> 7 in> (define foo (lambda (x) (lambda (y) (+ x y)))) out> function object in> (define bar (foo 3)) out> function object in> (bar 2)
(bar 2)in the box above?
xin the global environment?
yin the global environment?
(bar 2) not evaluate to
7.6.4) Changes to
We will need to make sure that
evaluate handles the
properly, by creating a new function object that stores the names of the
parameters, the expression representing the body of the function, and the
environment in which the function was defined. We also need to modify evaluate
to handle calling user-defined functions.
From a high-level perspective, your evaluator should now work in the following
way, given an expression
erepresents a number, it should evaluate to that number.
erepresents a variable name, it should evaluate to the value associated with that variable in the given environment, or it should raise an
EvaluationErrorif a binding cannot be found according to the rules above.
erepresents a special form (such as
define), it should be evaluated according to the rules for that special form.
eis a compound expression representing a function call. Each of the subexpressions should be evaluated in the given environment, and:
- If the first subexpression is a built-in function, it should be called with the remaining subexpressions as arguments (in order).
- If the first subexpression is a user-defined function, it should be called according to the rules given above.
If we try to call something that is not a function, or if we try to call a
function with the incorrect number of arguments passed in, an
should be raised.
NOTE that in the case of an S-expression that
isn't a special form, you should not need to care how the first element is
specified (whether it is a name, a
lambda, or a call to another function,
etc), so your evaluator should not have conditional behavior based on this.
Evaluating the first element in the S-expression give us the function that is
to be called, regardless of how it is specified (and your evaluator code should
not have conditional logic based on the form of that first element).
After you have made the changes above, try them out in the REPL using the
examples from subsubsection 7.6.3. Once you are reasonably
certain that everything is working, try them with
test.py. At this point, your code
should pass the tests in the
Test3_Func suite in
7.6.5) Easier Function Definitions
Implementing user-defined functions has given a lot of power to our
interpreter! But it is kind of a pain to type them out. Implement a shorter
syntax for function definitions, so that, if the
NAME in a
expression is itself an S-expression, it is implicitly translated to a function
definition before binding. For example:
(define (five) (+ 2 3))should be equivalent to
(define five (lambda () (+ 2 3)))
(define (square x) (* x x))should be equivalent to
(define square (lambda (x) (* x x)))
(define (add2 x y) (+ x y))should be equivalent to
(define add2 (lambda (x y) (+ x y)))
This is nice not only because it is easier to type, but also because it makes the definition of a function more closely mirror the syntax we will use when calling the function.
evaluate function so that it handles this new form. After
implementing this change, try it out in the REPL, and then in
this point, your code should pass all the tests in
At this point, we have a very nice start toward an interpreter for carlae. We have the ability to define variables, and to define and call functions. Note also that recursion and closures come about naturally as a result of the rules we have implemented for calling functions (we don't have to do any additional work to get those features!).
However, we still have some work to do before our language is complete. In particular, next week, we will add support for conditionals and lists.
9) Code Submission
Once you are finished with the code, please come to a tutorial, lab session, or office hour and add yourself to the queue asking for a checkoff. You must be ready to discuss your code and test cases in detail before asking for a checkoff.
You should be prepared to demonstrate your code (which should be well-commented, should avoid repetition, and should make good use of helper functions). In particular, be prepared to discuss:
- your implementation of
- your implementation of environments
- how your environment representation handles arbitrary names
- how you set up and use environments in
- your representation of functions
- how user-defined functions are called (including setting up the proper new environments)
- your implementation of the REPL (including shared environment, exiting on QUIT, staying alive after errors, and correctly evaluating input expressions