4 Lecture 1 – Introduction to Rebuilding the Universe
About half of the students who take this course have no background in programming languages, software verification, or formal abstract mathmetics. That’s okay. One of my goals is to teach those students something new and interesting about a research area that is relevant and possibly interesting to them, but not (yet) accessibly.
Therefore, I do not assume any background in any mathematics. Instead, I must rebuild formal mathematics from nothing, and do so in a short amount time.
This necessitates a certain amount of handwaving... In this section, I begin to make precise a formal system (with some handwaving) that I hope will be accessible to the audience just described. If you find the handwaving annoying, I end the notes with further reading.
4.1 What is math?
Before we begin rebuilding the entire universe of mathematics sufficient to formally define and reason about compilers, we must understand what math is.
In the first, the author describes the stages of mathematical eduction, roughly describing the point of taking various kinds of math classes. In short, there are three stages: pre-rigorous, rigorous, and post-rigorous. In pre-rigorous math, you are taught to compute in order to build intuition, but are not taught why or much about the underlying formal theory. In rigorous math, you taught the underlying formal theory. In post-rigorous math, you handwave over the formal theory when you know (or, think) it’s boring so you can focus on the big picture.
The bulk of this course exists in the "post-rigorous" phase—
The second post describes two broad-strokes classifications of mathematics. In short, math can either be analytic or synthetic. The distinction between "analytic" and "synthetic" is not a formal distinction, but a fuzzier, human distinction.
Most people are familiar with analytic math. This kind of math teaches you to break a problem down into simple, well-understood parts. In calculus, we break things down in to real numbers, functions on real numbers, equations between real numbers, etc. If we want to answer a question, like "how big is the earth", we can define "bigness" and "the earth" as real-numbery things and use calculus.
In synthetic math, we define a new theory, with new axioms, to describe the things we’re studying. We do not break things down into a big theory of things we already understand, but build them up from a small set of axioms.
Most of the work on compiler correctness, and indeed much work in programming
languages, is closer to synthetic mathematics.
We start from two(ish) simple axioms for reasoning and build the entire
universe—
4.2 The Two Axioms
Philisophically, the axioms of a formal system are the things we can all agree make sense, and which we just have to accept on faith (or something), because we have to start somewhere and can’t prove everything makes sense from nothing.
In this class, we will use two(ish) axioms. These are not formal axioms like the axioms of set theory, because I don’t have time to get that precise. These are... intuitive axioms, which I will connect to a formal system that I will never fully define. These two intuitive axioms are all you need to read and begin understanding most of the papers in the compiler correctness literature.
4.2.1 The First Axiom: Implication
The first axiom is the Axiom of Implication. It states that:
I can make judgments of the form "A implies B".
If I have a judgments of the form "A implies B", and a judgment "A", then I can conclude "B".
We can all agree that this seems, intuitively, like a good logical reasoning principle.
This refers to the idea of judgment, any statement I can make in my formal system. These are judgments in the sense that they decide or judge the property of some symbols you write on the page. The formal system says you can write any symbols you like on the page, but they only given meaning when some judgment pronounes that they indeed have meaning.
Formally, we write an implication using horizontal bar notation. For example, the following notation means "A implies B".
A |
----- |
B |
We can also write implications with many premises. The following examples mean "(A and B and C) implies D".
A B C |
----------- |
D |
For the pieces of a judgment, we can use any symbol we like. We can use English letters like "A", "B", or "meow", or numbers like "1" "42" or "-0", or random symbols like "⊢", "+", or "↑". Whatever symbol is there, remember: never assume you know what it means. Remember, we are rebuilding the entire universe, so nothing exists except what we’ve defined. The symbols are meaningless except for what the judgment says they mean. If I write 0, it may not be smallest natural number, the identity element on the addition function, or an integer. It could be string, or a boolean, or a function. It means only what the judgments say it means. Usually, though, we choose symbols to suggest a connection to an intuitive meaning. If I choose 0 but don’t mean zero, I’m a bad person.
To see an examples, we can define the booleans into our universe. Formally, we define a judgment as a list of rules. Each rule introduces a new axiom into the universe that we are defining. The rule is valid if it follows from one of our philisophical Axioms. Formally, we write:
|
----------- [True] |
true : Bool |
|
----------- [False] |
false : Bool |
This defines a judgment with two rules. By convention, I usually add a label to the right of each rule with a name for the rule. These rules are implications with no premises, so we can read the first rule, [True] as stating "anything implies true : Bool".
Usually, when I define a judgment, I start with a box that defines the shape of the judgment. This helps the reader quickly parse the pieces of the judgment. This judgment’s shape means it expects any symbol, followed by a colon character, followed by the symbol "Bool". For this course, I also give an English pronunciation and name for the judgment, to aid in referring to it. The English reading of this judgment is "any is judged a boolean", or "any is a Bool". Note that while we are suggesting booleans, we have not given these symbols any meaning that actually makes them behave life booleans.
Here, any is a meta-variable, something that is not really part of the formal system but should be interpreted by your brain as a place-holder for something. The any meta-variable stands for literally anything, and should match any symbol.
Now that we have a judgment, we can make statements, and prove that the statement is true or not in our universe. For example, I can prove that true : Bool holds (also pronounced "the _ : Bool judgment holds on (the symbol) true"). It follows easily by the rule [True]; it’s an axiom of our universe. The derivation, the tree of rules to follow, is a tree of height 1 with the rule [True].
Typically, we write derivations as trees of instances of the rules defining the judgment. A derivation looks just like a bunch of rules stacked on top of each other, except they contain no meta-variables. The derivation proving that true : Bool is written as follows.
---------- [ True ] |
true : Bool |
On paper, it looks identical to the rule [True].
We can also prove that 0 : Bool does not hold ("the _ : Bool judgment does not hold on (the symbol) 0"): consider all possibly cases of the _ : Bool. There are exactly two, and neither allows the symbol 0 on the left hand-side of the colon.
We can also define a judgments that refer to previously defined judgments. Below we define a judgment that says not written in front a Bool is a valid thing to write in front of a Bool. Intuitively, we would like this to mean it’s a valid operation on booleans.
|
any : Bool |
----------- [Not] |
not(any) : Bool |
This defines a judgment with one rule. That rule is a simple implication, and the premise refers to the previously defined judgment. The rule says not can be written next to a symbol any, if any is judged to be a Bool.
|
----------- [True] |
true : Bool |
---------------- [Not] |
not(true) : Bool |
This derivation has height 2. The derivation has one subderivation, i.e., a derivation that is a proof that the premise for some rule holds.
We didn’t have to define the is operation on Bool judgment as a separate judgment. We could have equally defined not(any) : Bool as a rule in is Bool judgment. However, then we would need a premise that recursively refers to the same judgment being defined. The Axiom of Implication does not allow us to do that; we should only refer to things that are already defined.
4.2.2 The Second Axiom: Induction
The second axiom is the Axiom of Induction. It states that:
I can define a judgment by cases that recursively refers to itself, so long as something gets small in the premise of each case.
- If I have an inductively define judgment, J, with rules R0, R1, ..., RN, and I want prove some property holds on instances of J, then it suffices to prove:
If P holds on the recursive subderivations of R0, then P holds for the conclusion of R0
If P holds on the recursive subderivations of R1, then P holds for the conclusion of R1
...
If P holds on the recursive subderivations of RN, then P holds for the conclusion of RN
It is far from obvious that this is a good, logical reasoning principle, so you’ll have to trust me on this one. Thankfully, it is suffiently powerful to let us build up the universe.
For example, we can define a single is Bool judgment that judges both boolean values and boolean operations:
|
----------- [True] |
true : Bool |
|
----------- [False] |
false : Bool |
|
any : Bool |
----------- [Not] |
not(any) : Bool |
The shape of the judgment puts no restrictions on the expression any, so
for example, not(true) is a valid any, just as is true.
Note that the rule [Not] refers to the same judgment we’re defining, but
it’s okay since something got smaller—
We can conclude not(true) : Bool, since true is Bool by [True], and therefore not(true) : Bool by [Not]. The derivation looks just like before, except now all the rules come from the same judgment.
We can also build a judgment that defines the natural numbers.
|
-------- [Zero] |
z : Nat |
|
any : Nat |
------------ [Add1] |
s any : Nat |
Note that in [Add1] we recursively refer to the same judgment being defined.
This is okay, since something got smaller—
Intuitively, this judgment defines the natural numbers, if the symbol z behaves like the number 0, and the symbol s behaves like the function that adds 1 to its argument. We haven’t defined any interpretation of the symbols z or s, so it’s not yet clear that they really do represent anything at all.
4.2.3 Defining Not-False Things: Rules of Thumb
Since we are rebuilding the entire universe by defining new axioms to reason about, we must take care to define things that are not false. It’s beyond the scope of this course to describe formally how to do this. Instead, I will give you some rules of thumb:
First, above the line, only refer to things that exist. Judgments can only refer to other judgments. Second, the premises should refer to parts of the conclusion, but in a way that something gets smaller.
When we define the judgment any(any) : Bool, we refered to the judgment any : Bool, which we had previously defined. This obeys rule-of-thumb #1, and is is a good sign. We also made something smaller in the rule [Not], by removing the symbol not. This obeys rule-of-thumb #2, and is is a good sign.
There are exceptions to these rules, and they may be insufficient, but you’ll need to take something like a set theory course to formally understand why. For example, in rule [S] when defining natural number, we refer to the very judgment we’re still defining, thus violating rule-of-thumb #1. However, the Axiom of Induction says this is okay, since something got smaller, so we allow the exception.
4.3 Defining a Language
We now have enough tools to being defining a language. But what is a language?
Surface syntax
Operations
A way to for a human to express instructions to a machine
Java
C
Python
A collection of expressions
Some operations on expressions
Some common properties
4.3.1 Modeling a Collection of Expressions
To model a language, we begin by modeling the expressions. We have only one way to model something: add it to our universe by defining judgments. So we start by defining the is Expression judgment. For brevity, and because it’s what programming language theorists usually do, I write this as ⊢ any.
Our language will be defined to have some numbers, plus, and user-defined functions. We define its expressions by the judgment below.
|
any : Nat |
------------ [E-Nat] |
⊢ any |
|
|
⊢ any1 |
⊢ any2 |
-------------------- [E-Plus] |
⊢ any1 + any2 |
|
|
--- [E-Var] |
⊢ x |
|
|
⊢ any |
--------- [E-Lam] |
⊢ λx.any |
|
|
⊢ any1 |
⊢ any2 |
--------- [E-App] |
⊢ any1 any2 |
This defines a judgment with gives rules, and refers to the is Nat judgment defined earlier. Anything that is Nat is also an Expression. So are two expression on either side of the symbol +. So is the symbol x. So is the symbols λx. to the left of an Expression. So are two expressions seperated by a space.
In these rules, I put each premise on its own line for clarity.
Those familiar with programming languages might think they recognize this language. But rest assured, you do not, because I have not given these symbols any meaning, so they cannot be related to anything you have seen before. They only meaning they have is that they are Expressions, whatever that means.