Untyped Programs Don’t Exist

2018-01-19 :: research

Lately, I’ve been thinking about various (false) dichotomies, such as typed vs untyped programming and type systems vs program logics. In this blog post, I will argue that untyped programs don’t exist (although the statement will turn out to be trivial).

TLDR

All languages are typed, but may use different enforcement mechanisms (static checking, dynamic checking, no checking, or some combination). We should talk about how to use types in programming—e.g. tools for writing and enforcing invariants about programs—instead of talking about types and type checking as properties of languages.

1. TLDR
2. Some Context
3. Definitions
4. Is X a Typed Language?
5. But I Don't Get Type Errors in X!
6. Untyped Programs Don't Exist.
7. Conclusion
8. Related Reading

Some Context

In most of my academic work, I work with “typed” languages. These languages have some nice properties for the metatheorist and compiler writer. Types lend themselves to strong automated reasoning, automatically eliminate large classes of errors, and simplify the job of whoever is reasoning about the programs. The downside is that the programmer must essentially statically prove properties of their program in such a way that a machine can understand the theorem and check the proof.

When I’m hacking, I write in “untyped” languages. I write programs in Racket, scripts in bash, plugins and tools in JavaScript, papers in latex, build systems in Makefile, and so on. These languages lend themselves to experimentation and avoid the overhead of necessarily proving properties of the programs. The downside is that the computer cannot help the programmer, since the programmer has not communicated the invariants about the program in a way the computer can understand.

“But surely”, a type evangelist says, “the very same benefits of types for metatheory help one develop the program in the first place? Why do you hobble yourself by omitting types? Come join us in the land of light! Use types from the start!”

“Dear friend, I couldn’t agree more!”, I reply, "Types are invaluable to developing my programs, but your ‘typed’ language prevent me from writing down my types!’

"Well, certainly there are some limitations of typed languages," the type evangelist concedes, "but this we could also choose to ignore the type system, create the uni-type, and program in the error monad. Now we have the benefits of both worlds!"

"Don’t you see,", I say excitedly, "that is just what I’m doing! My ‘untyped’ languages are, in fact, well-typed. My programs run implicitly in the error monad. What’s more, I am not required to prove it, for it is simply true."

A grave look comes over my interlocutor’s face. "But you forfeit all benefits of static typing. Your errors are reported later, and there are performance implications, and …"

"Exactly.", I interrupt. "We are not arguing about typing, for all programs are well typed. We are arguing about pragmatics."

Definitions

Before I can argue that untyped programs don’t exist, I need some near-formal definitions to work with. I posit the following definitions are reasonable, intuitive definitions about types and programs.

Definition. An expression is a symbol of sequence of symbols given some interpretation.

Example.

5 is an expression, whose interpretation is the number five.
e₁ + e₁ is an expression, whose interpretation is the mathematical addition function applied to expressions e₁ and e₂.
function(): return 5; is an expression, whose interpretation is a mathematical function that when applied to any number of arguments returns the expression 5.

Definition. A type is a statement of the invariants of some expressions.

Example.

a register word is a type describing the kinds of values that fit in an x86 register, such as a collection of 32 bits. A register word supports operations such as:
- move a value of type register word into a register
- move a value of type register word from one register into another
a pointer is a type describing a memory address. It is either uninitialized or a valid memory address. A pointer supports operations such as:
- initialization, giving an uninitialized pointer a value
- dereference, reading the value of the memory address of an initialized pointer
a Nat is a type describing an element of the set of natural numbers. A Nat supports operations such as:
- addition
- multiplication
- subtraction, but only when subtracting a smaller natural number from a larger natural number

Definition. A language is collection of expressions.

Example.

arith is a language containing the following expressions e:
- 0,1,2 …, etc and -1,-2,-3, …, etc, each representing an integer
- e₁ + e₂, where e₁ and e₂ are integers
- e₁ - e₂, where e₁ and e₂ are integers
JavaScript is a language, defined by the ECMAScript standard, and extended by various implementations.

Definition. A program is a collection of expressions from some language.

Example.

5 + 5 is an arith program.
5 is a JavaScript program.

Is X a Typed Language?

Is x86 assembly a typed language?

I say yes.

First, x86 assembly is a language. The language x86 assembly meets our definition of a language: it defines a collection of symbols or sequences of symbols given some interpretation. For example, mov ax, bx is an x86 assembly program that moves the contents of register bx to register ax.

Second, x86 is typed. Each expression in x86 assembly has invariants stated about the expression. For example, x86 defines the type “little endian”, which describes the particular encoding of binary data such as numbers over which operations like addition are defined. The division operation is well typed: as division is only defined when the denominator is non-zero. Attempting to divide by zero cause a type error (a dynamic exception).

I would make the same argument for every other language. C is a typed language. So is JavaScript. And Racket. And Haskell.

But I Don’t Get Type Errors in X!

First, you probably do. Second, when you don’t, that’s a major problem.

Let’s visit x86 for a moment, to see dynamically enforced type errors. Recall that division is not defined when the denominator is zero. The result of division by zero in x86 is defined to be a general-protection exception, error code 0. That is a type error. It’s a type error describing that you attempted to divide by zero, and that this is ill-typed. It is a dynamically enforced type error.

Let’s move to C, in which we can easily see two different kinds of type errors: static and unenforced. The language C includes expressions like x=e, where x is a declared name and e is an expression. The expression x=e raises a static error when x is undeclared; this is a static type error. It is a statically enforced invariant that names must be declared before they are used. Other invariants are not enforced at all, such as notorious undefined behavior. For example, bool b; if(b) { ... }; violates a C invariants, namely that uninitialized scalars are never used. However, C does not attempt to enforce this invariant, either statically or dynamically. The result of this sequence of symbols is undefined in C.

Untyped Programs Don’t Exist

First, a few more definitions based on the above arguments about the languages x86 and C.

Definition. A type error is an error raised during the enforcement of a type, i.e., during the enforcement of an invariant about an expression.

Definition. Undefined behavior is the result of interpreting a non-expression, i.e., a sequence of symbols that have no meaning because some invariant has been violated.

Theorem. Untyped Programs Don’t Exist.

Proof. Recall that programs consist of expressions from a language. Expressions are sequences of symbols that have meaning. But undefined behavior only results from non-expressions. As programs are composed of expressions, a program cannot have undefined behavior. Therefore, all programs obey the invariants required by the expressions in the language. That is, all programs are well typed, and untyped programs don’t exist. QED.

I warned you it was a trivial theorem.

Conclusion

The theorem is trivial, but still useful because it helps us reframe our discussion.

Really, the statement is just a rephrasing of type safety: “well typed programs don’t go wrong”. For type safety, what we show is that programs exhibit only defined behavior. The difference is that, typically, type safety is typically thought of as a property of a language, and in particular, of statically typed languages. We should think about type safety differently: it is a property we must enforce of programs. Enforcing it via static typing of every program in the language is one useful way, but it is not the only way, and we cannot always hope to have type safety of a language.

Instead of arguing about untyped vs typed, a non-existent distinction, we should accept that all programs have invariants that must be obeyed, i.e., all programs are typed. The argument we must have is about the pragmatics of types and type checking.

how can we express types about complex languages like x86 and C
under what situations should we enforce types, i.e., check types
is type checking useful
should we check types statically or dynamically
should we allow the programmer to circumvent types checking
is type checking decidable
should it be

The Meaning of Types - From Intrinsic to Extrinsic Semantics (Reynold 2000)

This paper proves equivalence of an intrinsically typed languages in which meaning is only assigned to well-typed programs and an extrinsically typed language in which programs are first given meaning and can separately be ascribed types and proved to inhabit those types. In the extrinsic semantics, Reynold’s treat all programs as existing in the universal domain, and use embedding-projection pairs essentially as contracts at run-time, since, e.g., only a function can be called. In my mind, this work essentially proves the same theorem as this blog post: even in an when the semantics of programs consider typing as happening “after” semantics, the semantics still require types.

Dynamic Languages are Static Languages (Harper 2011)

This blog post argues that dynamic languages are just straight-jacketed versions of static languages, and therefore they aren’t really a separate class of languages. In many ways, I agree with this blog post. Because “dynamic” languages lack any static enforcement, they can be a hindrance when you do know how to encode the types you want, and they can lead to weird type confusing programming patterns. My favorite example pattern is from Racket, where the value #f is sometimes used at type bool and sometimes at type Maybe A. This can lead to annoying problems with functions like findf over a list of bools. However, I think it ignores some of the pragmatics. For example, while sum types give you incredible expressive power, tagged sums are very annoying to use in many languages that enforce static typing, while very simple to use when you are not required to statically prove a term inhabits a sum.

On Typed, Untyped, and Uni-typed Languages (Tobin-Hochstadt 2014)

This blog post begins to get at some of the same criticisms of Harper’s view, and starts to talk about pragmatics.

What to Know Before Debating Type Systems (Smith 2010)

This blog post, reproduced in 2010 on a perl blog, does a great job of breaking down some false dichotomies and fallacies in discussions about type systems. It into more depth than this article about some distinctions in type systems, when they are meaningful and when they are not, and I pretty much agree with it.

The Calculi of Lambda-v-CS Conversion: A Syntactic Theory of Control and State in Imperative Higher-order Programming Languages (Felleisen 1987)

The abstract and chapter 1 of this dissertation have something to say about syntax and semantics, which I think are very related to the topic of this blog post. In particular, the thoughts on symbolic-syntactic reasoning I think are vital to understanding the trade-offs in different enforcements of typing.

What is Gradual Typing (Siek 2014)

This blog post discusses some trade-offs in static vs dynamic typing, in the context of gradual typing. To me, advancement in gradual typing is crucial in making typing enforcement more pragmatic. However, I disagree with some of the “good points” in this blog post. For example, the point “Dynamic type checking doesn’t get in your way” is a bad point to me; it’s also an argument in favor of no enforcement and undefined behavior. I also find some examples of gradual typing to be great evidence of what is wrong with gradual typing. For example, the program add1(true) at the end of the post should be refuted by a gradual type system, but passes current “plausibility checkers”, even when add1 has static type annotations requiring that its argument be a number.

TLDR

Table of Contents

Some Context

Definitions

Is X a Typed Language?

But I Don’t Get Type Errors in X!

Untyped Programs Don’t Exist

Conclusion

Related Reading

The Meaning of Types - From Intrinsic to Extrinsic Semantics (Reynold 2000)

Dynamic Languages are Static Languages (Harper 2011)

On Typed, Untyped, and Uni-typed Languages (Tobin-Hochstadt 2014)

What to Know Before Debating Type Systems (Smith 2010)

The Calculi of Lambda-v-CS Conversion: A Syntactic Theory of Control and State in Imperative Higher-order Programming Languages (Felleisen 1987)

What is Gradual Typing (Siek 2014)