8 Lecture 4 – Type Systems
In lecture, I covered some additional material about modeling stateful language. I’m retconning that; that never happened, and I’ll cover it later when it makes more sense.
I also covered type systems and proof by induction concurrently. I’m going to type set those separately, to make the notes more self-contained.
You can find the Redex model for this lecture note in share/lecture4-code.rkt.
8.1 Properties of Programs
A collection of expressions
Some operations on expressions
Some shared properties
So far, not many. We can reason about individual programs, even write proofs, but we can’t make many predictions. We can’t say what will happen without actually trying to run the program. We can’t make claims about all programs in the language.
We do have some properties in mind, probably. We know, intuitively, that if you only use booleans in the predicate of an if expression, then it will successfully evaluate. We know if you only ever add natural numbers, then the program will evaluate add correctly. But we haven’t formalized that intuition in a judgment, so we can’t prove it.
The way we formalize this is with type systems. A type system is a judgment that encodes what it means for a program to be well behaved, so that we can make formal, provable predictions about all well-typed programs. Let’s design a type system.
8.2 The Language of NatBool Expressions
e ::= z | s e | true | false | if e then e₁ else e₂ | e₁ + e₂ |
v ::= z | s v | true | false |
|
[ e → e] |
|
---------------------------- [Step-If-True] |
if true then e₁ else e₂ → e₁ |
|
|
---------------------------- [Step-If-False] |
if false then e₁ else e₂ → e₂ |
|
|
--------- [Step-Add-Z] |
z + e → e |
|
|
------------------------ [Step-Add-S] |
(s e₁) + e₂ → e₁ + (s e₂) |
|
[ e →* e] |
|
|
------ [Conv-Refl] |
e →* e |
|
|
e₁ → e₂ |
------ [Conv-Step] |
e₁ →* e₂ |
|
|
e₁ →* e₂ |
e₂ →* e₃ |
------ [Conv-Trans] |
e →* e₃ |
|
|
e₁ →* e₁' |
e₂ →* e₂' |
e₃ →* e₃' |
------------------------------------------------- [Conv-If-Cong] |
if e₁ then e₂ else e₃ →* if e₁' then e₂' else e₃' |
|
|
e₁ →* e₁' |
------------- [Conv-S-Cong] |
s e₁ →* s e₁' |
|
|
e₁ →* e₁' |
e₂ →* e₂' |
------------- [Conv-Add-Cong] |
e₁ + e₂ →* e₁' + e₂' |
|
[ eval(e) = v ] |
|
e →* v |
----------- |
eval(e) = v |
Note that I’ve take some liberties with the conversion judgment. It is not easily implemented as a recursive function, and I’ve combined all the congruence rules. This is pretty standard. All the congruence rules can be written as a single rule, because the conversion judgment has a reflexivity rule allow us to ignore any of the premises. This allows for a shorter judgment, but makes the relationship between rules non-obvious and important.
This language, as specified, allows us to attempt to evaluate nonsense terms, such as (s true) or if (true + z) then z else false. What we would like to say is that there is some class of well-behaved terms, and evaluation will never get stuck on those terms. That class of well-behaved terms is the class of well-typed terms, and we can formalize it by writing down a typing judgment that builds derivations that prove that reduction will succeeed. Then, any well-typed term will evaluate to some value. This property is called type safety.
Theorem (Type Safety): If e is well-typed then eval(e) = v
Since type safety depends on the type system and the evaluation judgment, actually there are many statements of type safety. Usually, we also want to allow for non-termination, and raising valid run-time exceptions. This would give us the more general statement:
Theorem (Type Safety, General): If e is well-typed then eval(e) = o
In general, our observations o will include values, nontermination (written Ω), and some set of valid run-time exceptions like division by zero exception.
Because our type system tells us how to build proofs that evaluation will succeed, the theorem Type Safety essentially states "if we have a proof that evaluation will succeed, then evaluation will in fact succeed". For this reason, Type Safety is sometimes called Type Soundness, alluding to the fact that the type system is a sound proof system with respect to the evaluation judgment.
You shouldn’t use Type Soundness unless you’re ready to argue with the philosophically inclined.
Note that the language NatBool exists before the type system. We can write down expressions that aren’t well-typed, and we can even try to evaluate them. We build the type system merely for its predictive power.
So enough about why and how we’d like to make predictions. Let’s build a type system.
8.3 A Type System for NatBool
The first thing we need when building a type system is some types. We can define a new judgment (new syntax) for what is a valid type. For NatBool, we have two types: Nat and Bool. The type Nat will, intuitively, be assigned to expressions that compute to values that represnet natural numbers. Similarly, Bool will be assigned to expressions that compute to values that represent booleans.
A ::= Nat | Bool |
Phew.
To define a type system, we’ll need a rule for each expression in our language. We have 6 expressions, so we expect a judgment with at least 6 rules.
Beyond that, it’s not always easy to know where to start. Introduction and elimination forms can be of help to us again. We should start with introduction forms, as they usually have obvious types. For example, z is meant to represent the natural number 0, so it ought to have type Nat. We write this formally as ⊢ z : Nat. Other introduction forms, such as true and false, are similarly obvious.
[ ⊢ e : A ] |
|
---------- [T-Z] |
⊢ z : Nat |
|
|
------------- [T-True] |
⊢ true : Bool |
|
|
-------------- [T-False] |
⊢ false : Bool |
For expressions with sub-expressions, such as s e, we have to think a little bit. The question we must ask our selves is: what must we prove about the sub-expression e to prove that s e will evaluate to a value of the type we want it to be. Since s e is meant to represent a natural number, we want it to be of type Nat. But we would only consider it representing a type Nat if e is also a Nat; s true doesn’t represent anything sensible, for example. We translate this into a formal type rule:
⊢ e : Nat |
---------- [T-S] |
⊢ s e : Nat |
The elimination forms are harder. Elimination forms don’t have an obvious type, but they do eliminate something of an obvious type. We can write their type rules by starting inside out. For plus, e₁ + e₂, we know we can only add two expressions that represent natural numbers, so we know e₁ and e₂ must have type Nat.
⊢ e₁ : Nat |
⊢ e₂ : Nat |
---------- [T-Add] |
⊢ e₁ + e₂ : ?? |
To figure out what the type of e₁ + e₂ is, though, we have to think.
We know that we want + to behave like addition—
⊢ e₁ : Nat |
⊢ e₂ : Nat |
---------- [T-Add] |
⊢ e₁ + e₂ : Nat |
We do the same thing for if expressions. We know that if will only succeed when it branches on a boolean:
⊢ e₁ : Bool |
??? |
---------- [T-If] |
⊢ if e₁ then e₂ else e₃ ₂ : ??? |
Now, what type will the expression evaluate to? Well that depends on the boolean. We know that it will evaluate to the type of the expression in which ever branch it happens to jump to. Since we don’t know how to write anything as complicated as that, we make a simplifying assumption. We assume both branches have the same type, call it A. If both branches have the type A, then the if expression will evaluate to some value of type A, as long as e₁ is a boolean.
⊢ e₁ : Bool |
⊢ e₂ : A |
⊢ e₃ : A |
---------- [T-If] |
⊢ if e₁ then e₂ else e₃ ₂ : A |
This gives us the final type system:
[ ⊢ e : A ] |
|
---------- [T-Z] |
⊢ z : Nat |
|
|
⊢ e : Nat |
---------- [T-S] |
⊢ s e : Nat |
|
|
------------- [T-True] |
⊢ true : Bool |
|
|
-------------- [T-False] |
⊢ false : Bool |
|
|
⊢ e₁ : Bool |
⊢ e₂ : A |
⊢ e₃ : A |
---------- [T-If] |
⊢ if e₁ then e₂ else e₃ ₂ : A |
|
|
⊢ e₁ : Nat |
⊢ e₂ : Nat |
---------- [T-Add] |
⊢ e₁ + e₂ : Nat |
X |
------------ |
⊢ true : Nat |
-------------- |
⊢ s true : Nat |
If we prove some other facts about our type system, like that every term has a unique type, we could formally prove that this term cannot be well-typed, since true’s unique type is Bool, which is not equal to Nat. We’re not going there in this class.
|
--------- T-Z --------- T-Z |
⊢ z : Nat ⊢ z : Nat |
------------ T-S -------------- T-S |
⊢ (s z) : Nat ⊢ (s z) : Nat |
-------------------------------------- T-Add |
⊢ (s z) + (s z) : Nat |
|
--------- T-Z |
⊢ z : Nat |
------------- T-S |
⊢ (s z) : Nat |
----------------- T-S |
⊢ (s (s z)) : Nat |
----------------- T-S |
⊢ (s (s z)) : Nat |
8.4 A Type System for λNatBool
Let’s extend our language, and our type system, with functions.
This gets confusing, since we also write reduction as an arrow. Sorry. PL people are bad at syntax, because they think they’re so good at it.
A,B ::= Nat | Bool | A -> B |
?? |
--------------- |
⊢ λx.e : A -> B |
⊢ e : B |
if it gets an argument of type A |
--------------- |
⊢ λx.e : A -> B |
Γ ::= · | Γ,x:A |
[ Γ ⊢ e : A ] |
|
---------- [T-Z] |
Γ ⊢ z : Nat |
|
|
Γ ⊢ e : Nat |
---------- [T-S] |
Γ ⊢ s e : Nat |
|
|
------------- [T-True] |
Γ ⊢ true : Bool |
|
|
-------------- [T-False] |
Γ ⊢ false : Bool |
|
|
Γ ⊢ e₁ : Bool |
Γ ⊢ e₂ : A |
Γ ⊢ e₃ : A |
---------- [T-If] |
Γ ⊢ if e₁ then e₂ else e₃ ₂ : A |
|
|
Γ ⊢ e₁ : Nat |
Γ ⊢ e₂ : Nat |
---------- [T-Add] |
Γ ⊢ e₁ + e₂ : Nat |
Γ,x:A ⊢ e : B |
--------------- |
Γ ⊢ λx.e : A -> B |
Γ ⊢ e₁ : A -> B |
Γ ⊢ e₂ : ??? |
--------------- |
Γ ⊢ e₁ e₂ : ?? |
Γ ⊢ e₁ : A -> B |
Γ ⊢ e₂ : A |
--------------- |
Γ ⊢ e₁ e₂ : ?? |
Γ ⊢ e₁ : A -> B |
Γ ⊢ e₂ : A |
--------------- |
Γ ⊢ e₁ e₂ : B |
Fantastic. We can now predict the success of function applications.
Finally, we need a rule for names x. Names are always complicated, and the typing rule is no exception. It behaves like neither an introduction nor elimination form. It’s type is not obvious just from its structure, nor is anything eliminating something whose type is obvious. Instead, we have to think. And we think.... we have this convenient list of assumptions about variables, so let’s just look up the type in there.
x : A ∈ Γ |
---------- |
Γ ⊢ x : A |
[ x : A ∈ Γ ] |
|
--------------- |
x : A ∈ (Γ,x:A) |
|
x₁ : A ∈ Γ |
x₁ != x₂ |
--------------- |
x₁ : A ∈ (Γ,x₂:A) |
[ Γ ⊢ e : A ] |
|
x : A ∈ Γ |
---------- [T-Var] |
Γ ⊢ x : A |
|
|
---------- [T-Z] |
Γ ⊢ z : Nat |
|
|
Γ ⊢ e : Nat |
---------- [T-S] |
Γ ⊢ s e : Nat |
|
|
------------- [T-True] |
Γ ⊢ true : Bool |
|
|
-------------- [T-False] |
Γ ⊢ false : Bool |
|
|
Γ ⊢ e₁ : Bool |
Γ ⊢ e₂ : A |
Γ ⊢ e₃ : A |
---------- [T-If] |
Γ ⊢ if e₁ then e₂ else e₃ ₂ : A |
|
|
Γ ⊢ e₁ : Nat |
Γ ⊢ e₂ : Nat |
---------- [T-Add] |
Γ ⊢ e₁ + e₂ : Nat |
|
|
Γ,x:A ⊢ e : B |
--------------- |
Γ ⊢ λx.e : A -> B |
|
|
Γ ⊢ e₁ : A -> B |
Γ ⊢ e₂ : A |
--------------- |
Γ ⊢ e₁ e₂ : B |
8.5 Type Systems and Type Annotations
x:Nat ⊢ x : Nat |
--------------------- |
· ⊢ λx.x : Nat -> Nat |
x:Bool ⊢ x : Bool |
--------------------- |
· ⊢ λx.x : Bool -> Bool |
This is fine. A type system exists regardless of the type annotations, or whether we think of the language as "typed". In fact, this is even good from the perspective of code reuse.
However, we have to construct a derivation manually: the syntax of programs is not sufficient to infer a type. In general, this means the type system is undecidable: we cannot decide if a given program is well-typed.
We can see this more clearly when we try to translate the judgment into Redex. The only way to write the judgment is as a modeless judgment. We cannot assign the type as an output, meaning it can be inferred, because the argument type for functions would need to come out of thin air. This manifests in Redex telling us that that a pattern variable is unbound when we try to define the judgment.
> (define-judgment-form L #:contract (⊢ Γ e : A) #:mode (⊢ I I I O) [(∈ x A Γ) ---------- "T-Var" (⊢ Γ x : A)] [---------- "T-Z" (⊢ Γ z : Nat)] [(⊢ Γ e : Nat) ---------- "T-S" (⊢ Γ (s e) : Nat)] [------------- "T-True" (⊢ Γ true : Bool)] [-------------- "T-False" (⊢ Γ false : Bool)] [(⊢ Γ e_1 : Bool) (⊢ Γ e_2 : A) (⊢ Γ e_3 : A) ---------- "T-If" (⊢ Γ (if e_1 then e_2 else e_3) : A)] [(⊢ Γ e_1 : Nat) (⊢ Γ e_2 : Nat) ---------- "T-Add" (⊢ Γ (e_1 + e_2) : Nat)] [(⊢ (Γ x : A) e : B) --------------- "T-λ" (⊢ Γ (λ x e) : (A -> B))] [(⊢ Γ e_1 : (A -> B)) (⊢ Γ e_2 : A) --------------- (⊢ Γ (e_1 e_2) : B)]) eval:1:0: define-judgment-form: unbound pattern variable
in: A
This is why typed languages will usually require an annotation on the argument name. We can add a second syntax for functions with annotated arguments. For these functions, we can easily decide all types. And we can see that by defining a moded judgment in which the type is an output, and Redex will happily infer types for us.
> (define-judgment-form L #:contract (⊢ Γ e : A) #:mode (⊢ I I I O) [(∈ x A Γ) ---------- "T-Var" (⊢ Γ x : A)] [---------- "T-Z" (⊢ Γ z : Nat)] [(⊢ Γ e : Nat) ---------- "T-S" (⊢ Γ (s e) : Nat)] [------------- "T-True" (⊢ Γ true : Bool)] [-------------- "T-False" (⊢ Γ false : Bool)] [(⊢ Γ e_1 : Bool) (⊢ Γ e_2 : A) (⊢ Γ e_3 : A) ---------- "T-If" (⊢ Γ (if e_1 then e_2 else e_3) : A)] [(⊢ Γ e_1 : Nat) (⊢ Γ e_2 : Nat) ---------- "T-Add" (⊢ Γ (e_1 + e_2) : Nat)] [(⊢ (Γ x : A) e : B) --------------- "T-λ" (⊢ Γ (λ (x : A) e) : (A -> B))] [(⊢ Γ e_1 : (A -> B)) (⊢ Γ e_2 : A) --------------- (⊢ Γ (e_1 e_2) : B)]) > (judgment-holds (⊢ · (λ (x : Nat) x) : A) A) '((Nat -> Nat))
Unfortunately, now that we have annotations, our functions are less generic. We must define separate identity functions for natural numbers and booleans, even though they behave the same, and work with ill-typed arguments.
> (judgment-holds (⊢ · (λ (x : Nat) x) : A) A) '((Nat -> Nat))
> (judgment-holds (⊢ · (λ (x : Bool) x) : A) A) '((Bool -> Bool))
> (judgment-holds (⊢ · ((λ (x : Bool) x) z) : A) A) '()
> (judgment-holds (⊢ · ((λ (x : Bool) x) true) : A) A) '(Bool)
> (judgment-holds (eval ((λ (x : Bool) x) true) o) o) '(true)
> (judgment-holds (eval ((λ (x : Bool) x) z) o) o) '(z)
> (judgment-holds (eval ((λ (x : Nat) x) true) o) o) '(true)
> (judgment-holds (eval ((λ (x : Nat) x) z) o) o) '(z)