I've wanted to build a programming language from scratch for years. Not a compiler for production — just something small enough to understand completely, complex enough to teach me something real. Last January I finally sat down and did it. Two days later, I had Koda: a dynamically typed scripting language with first-class functions, closures, and a REPL.
In this post I'll walk through the architecture of a tree-walking interpreter — the simplest kind — and share the specific things I had to wrestle with that I didn't fully understand from reading about them.
The pipeline: four stages
An interpreter has a clear layered structure. Source text flows through each stage, transformed into a successively higher-level representation:
- Lexer (tokenizer) — converts raw characters into a flat list of tokens
- Parser — converts tokens into an Abstract Syntax Tree (AST)
- Resolver — walks the AST to resolve variable scopes (optional but important for closures)
- Evaluator (interpreter) — walks the AST and produces values
The names vary by tradition, but the structure is universal. Each stage is largely independent, which makes them easy to test and reason about separately.
The lexer: boring but essential
The lexer's job is mechanical: consume characters, emit tokens. A token is just a small struct with a type and a value:
enum TokenType {
NUMBER, STRING, IDENTIFIER,
PLUS, MINUS, STAR, SLASH,
LEFT_PAREN, RIGHT_PAREN,
LET, FN, IF, ELSE, RETURN,
EOF
}
record Token(TokenType type, String lexeme, int line) {}
The main challenge is handling things like multi-character tokens (!=, ==, >=), string literals with escape sequences, and ignoring comments. I used a simple cursor approach — keep a start and current index, advance through characters, and emit tokens when you recognize a complete lexeme.
The parser: recursive descent
Recursive descent is the simplest parsing technique for hand-written parsers. Each grammar rule maps to a function. The grammar for a basic expression language looks something like this:
expression → assignment
assignment → IDENTIFIER "=" assignment | equality
equality → comparison ( ("!=" | "==") comparison )*
comparison → term ( (">" | ">=" | "<" | "<=") term )*
term → factor ( ("-" | "+") factor )*
factor → unary ( ("/" | "*") unary )*
unary → ("!" | "-") unary | primary
primary → NUMBER | STRING | IDENTIFIER | "(" expression ")"
Each level of precedence is a function that calls the next lower level. The magic is that operator precedence falls out of the call stack — you never have to think about it explicitly.
The parser produces AST nodes — instances of classes like BinaryExpr, CallExpr, FnDecl. I used the Visitor pattern to keep the evaluator logic separate from the node classes, which made adding new behavior (like a pretty-printer) trivial.
The part that actually matters: environments and closures
The evaluator is where things get interesting. The core concept is the environment: a mapping from variable names to values. Every block creates a new environment whose parent is the enclosing one. Variable lookup walks up the chain until it finds a binding or hits the top.
class Environment {
private final Environment parent;
private final Map<String, Object> values = new HashMap<>();
Object get(String name) {
if (values.containsKey(name)) return values.get(name);
if (parent != null) return parent.get(name);
throw new RuntimeError("Undefined variable '" + name + "'");
}
void define(String name, Object value) {
values.put(name, value);
}
}
Closures are what happen when a function captures the environment it was defined in, not the one it executes in. This sounds simple, but it took me a few attempts to get right.
When you define a function, you store a reference to the current environment alongside the function's AST. When you call the function, you create a new environment whose parent is the captured environment — not the calling environment. This is the entire mechanism that makes closures work.
class KodaFunction {
private final FnDecl declaration;
private final Environment closure; // captured at definition time
Object call(List<Object> args) {
Environment env = new Environment(closure); // parent = closure!
for (int i = 0; i < declaration.params.size(); i++) {
env.define(declaration.params.get(i), args.get(i));
}
return interpreter.executeBlock(declaration.body, env);
}
}
The resolver: fixing a subtle bug
Without a resolver, Koda had a nasty variable resolution bug. Because the interpreter walked up the environment chain dynamically, the result of evaluating a closure could change depending on what was in the calling scope — the classic dynamic vs. lexical scoping distinction.
The fix is a resolver pass between parsing and evaluation. It walks the AST once, statically, and records exactly how many environments up each variable reference is. At runtime the evaluator uses that number directly instead of searching. This makes scoping lexical and consistent.
What I learned
Building Koda was the single most educational programming project I've done. Reading about closures is one thing; debugging a closure that captures the wrong environment at 11pm is another. A few takeaways:
- Recursive descent is genuinely simple. If you can write a grammar, you can write the parser.
- The environment chain is the entire interpreter. Everything else — control flow, functions, closures — is about manipulating environments correctly.
- A resolver pass is not optional if you want sane scoping behavior.
- Writing the whole thing in ~800 lines of Java made the structure crystal clear. There's nowhere to hide complexity at that scale.
If you haven't done it yet, I'd strongly recommend Crafting Interpreters by Robert Nystrom. It's the best programming book I've read in years, and it's free online.