4 Lecture 1 -- Introduction to Rebuilding the Universe

7.5.0.17

4 Lecture 1 – Introduction to Rebuilding the Universe

About half of the students who take this course have no background in programming languages, software verification, or formal abstract mathmetics. That’s okay. One of my goals is to teach those students something new and interesting about a research area that is relevant and possibly interesting to them, but not (yet) accessibly.

Therefore, I do not assume any background in any mathematics. Instead, I must rebuild formal mathematics from nothing, and do so in a short amount time.

This necessitates a certain amount of handwaving... In this section, I begin to make precise a formal system (with some handwaving) that I hope will be accessible to the audience just described. If you find the handwaving annoying, I end the notes with further reading.

4.1 What is math?

Before we begin rebuilding the entire universe of mathematics sufficient to formally define and reason about compilers, we must understand what math is.

I strongly recommend the following two articles as related reading:

In the first, the author describes the stages of mathematical eduction, roughly describing the point of taking various kinds of math classes. In short, there are three stages: pre-rigorous, rigorous, and post-rigorous. In pre-rigorous math, you are taught to compute in order to build intuition, but are not taught why or much about the underlying formal theory. In rigorous math, you taught the underlying formal theory. In post-rigorous math, you handwave over the formal theory when you know (or, think) it’s boring so you can focus on the big picture.

The bulk of this course exists in the "post-rigorous" phase—we want to have some formal definitions of languages and compilers, but handwave over annoying details so we can focus on what "compiler correctness" even means. Unfortunately, I must assume you have no rigorous math background. So before we can begin with the post-rigorous, I must drag you screaming through the rigorous. Sorry.

The second post describes two broad-strokes classifications of mathematics. In short, math can either be analytic or synthetic. The distinction between "analytic" and "synthetic" is not a formal distinction, but a fuzzier, human distinction.

Most people are familiar with analytic math. This kind of math teaches you to break a problem down into simple, well-understood parts. In calculus, we break things down in to real numbers, functions on real numbers, equations between real numbers, etc. If we want to answer a question, like "how big is the earth", we can define "bigness" and "the earth" as real-numbery things and use calculus.

In synthetic math, we define a new theory, with new axioms, to describe the things we’re studying. We do not break things down into a big theory of things we already understand, but build them up from a small set of axioms.

Most of the work on compiler correctness, and indeed much work in programming languages, is closer to synthetic mathematics. We start from two(ish) simple axioms for reasoning and build the entire universe—programming languages, computation, equivalence between programs and expressions, compilation, linking—from these two(ish) axioms.

4.2 The Two Axioms

Philisophically, the axioms of a formal system are the things we can all agree make sense, and which we just have to accept on faith (or something), because we have to start somewhere and can’t prove everything makes sense from nothing.

In this class, we will use two(ish) axioms. These are not formal axioms like the axioms of set theory, because I don’t have time to get that precise. These are... intuitive axioms, which I will connect to a formal system that I will never fully define. These two intuitive axioms are all you need to read and begin understanding most of the papers in the compiler correctness literature.

4.2.1 The First Axiom: Implication

The first axiom is the Axiom of Implication. It states that:

I can make judgments of the form "A implies B".
If I have a judgments of the form "A implies B", and a judgment "A", then I can conclude "B".

We can all agree that this seems, intuitively, like a good logical reasoning principle.

This refers to the idea of judgment, any statement I can make in my formal system. These are judgments in the sense that they decide or judge the property of some symbols you write on the page. The formal system says you can write any symbols you like on the page, but they only given meaning when some judgment pronounes that they indeed have meaning.

Formally, we write an implication using horizontal bar notation. For example, the following notation means "A implies B".

-----

We can also write implications with many premises. The following examples mean "(A and B and C) implies D".

A B C

-----------

For the pieces of a judgment, we can use any symbol we like. We can use English letters like "A", "B", or "meow", or numbers like "1" "42" or "-0", or random symbols like "⊢", "+", or "↑". Whatever symbol is there, remember: never assume you know what it means. Remember, we are rebuilding the entire universe, so nothing exists except what we’ve defined. The symbols are meaningless except for what the judgment says they mean. If I write 0, it may not be smallest natural number, the identity element on the addition function, or an integer. It could be string, or a boolean, or a function. It means only what the judgments say it means. Usually, though, we choose symbols to suggest a connection to an intuitive meaning. If I choose 0 but don’t mean zero, I’m a bad person.

To see an examples, we can define the booleans into our universe. Formally, we define a judgment as a list of rules. Each rule introduces a new axiom into the universe that we are defining. The rule is valid if it follows from one of our philisophical Axioms. Formally, we write:

any : Bool any is Bool

----------- [True]

true : Bool

----------- [False]

false : Bool

This defines a judgment with two rules. By convention, I usually add a label to the right of each rule with a name for the rule. These rules are implications with no premises, so we can read the first rule, [True] as stating "anything implies true : Bool".

Usually, when I define a judgment, I start with a box that defines the shape of the judgment. This helps the reader quickly parse the pieces of the judgment. This judgment’s shape means it expects any symbol, followed by a colon character, followed by the symbol "Bool". For this course, I also give an English pronunciation and name for the judgment, to aid in referring to it. The English reading of this judgment is "any is judged a boolean", or "any is a Bool". Note that while we are suggesting booleans, we have not given these symbols any meaning that actually makes them behave life booleans.

Here, any is a meta-variable, something that is not really part of the formal system but should be interpreted by your brain as a place-holder for something. The any meta-variable stands for literally anything, and should match any symbol.

Now that we have a judgment, we can make statements, and prove that the statement is true or not in our universe. For example, I can prove that true : Bool holds (also pronounced "the _ : Bool judgment holds on (the symbol) true"). It follows easily by the rule [True]; it’s an axiom of our universe. The derivation, the tree of rules to follow, is a tree of height 1 with the rule [True].

Typically, we write derivations as trees of instances of the rules defining the judgment. A derivation looks just like a bunch of rules stacked on top of each other, except they contain no meta-variables. The derivation proving that true : Bool is written as follows.

---------- [ True ]

true : Bool

On paper, it looks identical to the rule [True].

We can also prove that 0 : Bool does not hold ("the _ : Bool judgment does not hold on (the symbol) 0"): consider all possibly cases of the _ : Bool. There are exactly two, and neither allows the symbol 0 on the left hand-side of the colon.

We can also define a judgments that refer to previously defined judgments. Below we define a judgment that says not written in front a Bool is a valid thing to write in front of a Bool. Intuitively, we would like this to mean it’s a valid operation on booleans.

any1(any2) : Bool any1 is an operation on Bool

any : Bool

----------- [Not]

not(any) : Bool

This defines a judgment with one rule. That rule is a simple implication, and the premise refers to the previously defined judgment. The rule says not can be written next to a symbol any, if any is judged to be a Bool.

Now we can prove that not(true) : Bool, or intuitively, that not applied to true is a valid Bool. The derivation is the following.

----------- [True]

true : Bool

---------------- [Not]

not(true) : Bool

This derivation has height 2. The derivation has one subderivation, i.e., a derivation that is a proof that the premise for some rule holds.

We didn’t have to define the is operation on Bool judgment as a separate judgment. We could have equally defined not(any) : Bool as a rule in is Bool judgment. However, then we would need a premise that recursively refers to the same judgment being defined. The Axiom of Implication does not allow us to do that; we should only refer to things that are already defined.

4.2.2 The Second Axiom: Induction

The second axiom is the Axiom of Induction. It states that:

I can define a judgment by cases that recursively refers to itself, so long as something gets small in the premise of each case.
If I have an inductively define judgment, J, with rules R0, R1, ..., RN, and I want prove some property holds on instances of J, then it suffices to prove:
- If P holds on the recursive subderivations of R0, then P holds for the conclusion of R0
- If P holds on the recursive subderivations of R1, then P holds for the conclusion of R1
- ...
- If P holds on the recursive subderivations of RN, then P holds for the conclusion of RN

It is far from obvious that this is a good, logical reasoning principle, so you’ll have to trust me on this one. Thankfully, it is suffiently powerful to let us build up the universe.

For example, we can define a single is Bool judgment that judges both boolean values and boolean operations:

any : Bool any is Bool

----------- [True]

true : Bool

----------- [False]

false : Bool

any : Bool

----------- [Not]

not(any) : Bool

The shape of the judgment puts no restrictions on the expression any, so for example, not(true) is a valid any, just as is true. Note that the rule [Not] refers to the same judgment we’re defining, but it’s okay since something got smaller—we removed the symbol not from the expression being judged.

We can conclude not(true) : Bool, since true is Bool by [True], and therefore not(true) : Bool by [Not]. The derivation looks just like before, except now all the rules come from the same judgment.

We can also build a judgment that defines the natural numbers.

any : Nat

-------- [Zero]

z : Nat

any : Nat

------------ [Add1]

s any : Nat

Note that in [Add1] we recursively refer to the same judgment being defined. This is okay, since something got smaller—we removed the symbol s from the expression being judged.

Intuitively, this judgment defines the natural numbers, if the symbol z behaves like the number 0, and the symbol s behaves like the function that adds 1 to its argument. We haven’t defined any interpretation of the symbols z or s, so it’s not yet clear that they really do represent anything at all.

4.2.3 Defining Not-False Things: Rules of Thumb

Since we are rebuilding the entire universe by defining new axioms to reason about, we must take care to define things that are not false. It’s beyond the scope of this course to describe formally how to do this. Instead, I will give you some rules of thumb:

First, above the line, only refer to things that exist. Judgments can only refer to other judgments. Second, the premises should refer to parts of the conclusion, but in a way that something gets smaller.

When we define the judgment any(any) : Bool, we refered to the judgment any : Bool, which we had previously defined. This obeys rule-of-thumb #1, and is is a good sign. We also made something smaller in the rule [Not], by removing the symbol not. This obeys rule-of-thumb #2, and is is a good sign.

There are exceptions to these rules, and they may be insufficient, but you’ll need to take something like a set theory course to formally understand why. For example, in rule [S] when defining natural number, we refer to the very judgment we’re still defining, thus violating rule-of-thumb #1. However, the Axiom of Induction says this is okay, since something got smaller, so we allow the exception.

4.3 Defining a Language

We now have enough tools to being defining a language. But what is a language?

When polled, most students start describing a language features like the following:

Surface syntax
Operations
A way to for a human to express instructions to a machine

However, when asked to name languages, they name a few languages with much in common.

Java
C
Python

To me, a language is:

A collection of expressions
Some operations on expressions
Some common properties

4.3.1 Modeling a Collection of Expressions

To model a language, we begin by modeling the expressions. We have only one way to model something: add it to our universe by defining judgments. So we start by defining the is Expression judgment. For brevity, and because it’s what programming language theorists usually do, I write this as ⊢ any.

Our language will be defined to have some numbers, plus, and user-defined functions. We define its expressions by the judgment below.

⊢ any any is an Expression

any : Nat

------------ [E-Nat]

⊢ any

⊢ any1

⊢ any2

-------------------- [E-Plus]

⊢ any1 + any2

--- [E-Var]

⊢ x

⊢ any

--------- [E-Lam]

⊢ λx.any

⊢ any1

⊢ any2

--------- [E-App]

⊢ any1 any2

This defines a judgment with gives rules, and refers to the is Nat judgment defined earlier. Anything that is Nat is also an Expression. So are two expression on either side of the symbol +. So is the symbol x. So is the symbols λx. to the left of an Expression. So are two expressions seperated by a space.

In these rules, I put each premise on its own line for clarity.

Those familiar with programming languages might think they recognize this language. But rest assured, you do not, because I have not given these symbols any meaning, so they cannot be related to anything you have seen before. They only meaning they have is that they are Expressions, whatever that means.

← prev up next →

1	Announcements and Changes
2	Syllabus, CPSC 539B
3	Course Calendar
4	Lecture 1 – Introduction to Rebuilding the Universe
5	Lecture 2 – Modeling a Language, Continued
6	Lecture 3 – The First Complete Model of a Language
7	Homework 1 – Practice Modeling a Language
8	Lecture 4 – Type Systems
9	Homework 2 – Practice with Type Systems
10	Lecture 5 and 6 – Proof by Induction
11	Lecture 10 – Introduction to Compiler Correctness