2 Syntax in Redex

8.10

2 Syntax in Redex

The first part of my language model is defining the collection expressions in my language. This includes the expressions of my language, the types, and the data. Sometimes I encode the meta structures—environments, machine configurations, etc—as syntax. I will also usually encode simple properties of interest in syntax, such as whether a term is a value or not. This is exactly what we do on paper in programming languages research.

After I have defined a syntax, Redex can do the following things for me:

decide whether an s-expression matches a BNF nonterminal
decompose a term via pattern matching
decide α-equivalence of two terms
perform substitution
generate expressions the match a nonterminal

2.1 Syntax TLDR

In short, Redex makes defining and working with syntax damn simple. I use define-language to create BNF grammars. With #:binding-forms (binding forms), I get capture-avoiding substitution (substitute), and α-equivalence (alpha-equivalent?), for free while retaining named identifiers rather than annoying representations such as de Bruijn. Using define-extended-language, I can easily extend existing languages, which can make it easy to modularize grammars; although, I usually avoid it because then I have to remember too many language identifiers. I use redex-match?, redex-match, and redex-let to query whether expressions match nonterminals and to let Redex decompose syntax for me. I use the test functions test-predicate, test-equal, test-results, and the parameter default-equiv, for writing test suites about syntax, and redex-check to do random testing of syntactic properties.

One caveat to be aware of is that grammars with side conditions prevent redex-check from working over the grammar.

There are two common pitfalls to avoid when working with syntax in Redex.

First, be careful about using untagged, arbitrary variables, such as by using variable or variable-not-otherwise-mentioned in your syntax definition. This makes it really easy to create arbitrary variable names, but also easy to make typos be interpreted as variables. This can cause very unexpected failures. Apparently valid matches will not fail to match and metafunctions can have undefined behavior. One way to help avoid this problem is to tag variables, either in the grammar or with a symbolic prefix using variable-prefix.

Second, avoid unicode subscripts, and be careful with TeX input mode. Unicode subscripts are different than a nonterminal followed by an underscore, i.e., any₁ is a symbol while any_1 is a pattern variable matching an any pattern. This is made worse since TeX input mode will transparently replace the second expression with the first. DrRacket has better LaTeX input that avoids this problem by default.

2.2 Defining Syntax

In Redex, we start defining the syntax of a language with the define-language form. Below is the syntax for the lanugage BoxyL, the simply-typed λ-calculus with the box modality. As a convention, I usually end my language names with the letter "L".

Examples:

> (require redex/reduction-semantics)
> (define-language BoxyL
    (e ::= x natural (+ e e) (cons e e) (car e) (cdr e) (λ (x : A) e) (e e)
           (box e) (let ((box y) e) e))
    (A B ::= Nat (A × B) (A → B) (□ A))
    (x y ::= variable-not-otherwise-mentioned)
    (v ::= natural (box e) (λ (x : A) e) (cons v v))

    #:binding-forms
    (λ (x : A) e #:refers-to x)
    (let ((box y) e_1) e_2 #:refers-to y))

I usually start my models by requiring redex/reduction-semantics instead of redex. The latter loads GUI stuff I rarely use, and sometimes don’t even install.

The define-language form takes a language identifier, which is used by some functions to specify which grammar and binding specification to use, and BNF-esque grammar, in s-expression notation. After the grammar, I give a binding specification, from which Redex infers definitions of substitution and α-equivalence.

This language defines expression using meta-variable e, which includes variables x, natural numbers, cons pairs, λ, application, the box introduction form, and a pattern matching form for box elimination. It also defines their types, as meta-variables A and B. Note that while productions need to be s-expressions, they need not be in prefix notation. Variables, x and y, are any symbol not used as a keyword or literal elsewhere in the grammar, variable-not-otherwise-mentioned.

This is essentially equivalent to the grammar from the Coq model in Preface—Why Redex, but also fixes a representation of variables, and defines the metafunction substitute and the Racket alpha-equivalent? for BoxyL.

2.3 Pattern Matching

Redex has a sophisticated formal pattern language, but in essence any symbols not recognized as a BNF symbol is treated as a keyword. When I write (car e) in the grammar, this is understood to mean the literal symbol car followed by some expression e is also an expression. The pattern language also has some built-in nonterminals, such as natural which indicates a natural number literal such as 5.

We can ask the Redex pattern matcher, via redex-match?, whether terms match some nonterminal from the grammar.

Examples:

> (define eg1 (term (box 5)))
> (define eg2 (term (cdr 5)))
> (redex-match? BoxyL e eg1)
#t
> (redex-match? BoxyL v eg1)
#t
> (redex-match? BoxyL v eg2)
#f

This is helpful particularly with large complex languages, or low-level languages with detailed machine configurations. In these kinds of languages, its common to restrict syntax to achieve various properties, and may be non-obvious whether a term is valid.

Note that keywords and literal symbols from the grammar are not variables, but anything else is.

Examples:

> (redex-match? BoxyL x (term car))
#f
> (redex-match? BoxyL x (term λ))
#f
> (redex-match? BoxyL x (term :))
#f
> (redex-match? BoxyL x (term kar))
#t
> (redex-match? BoxyL x (term lambda))
#t

One of the important built-in patterns that I use frequently is any, which matches any s-expression.

Examples:

> (redex-match? BoxyL any (term car))
#t
> (redex-match? BoxyL (any_1 any_1) (term (car car)))
#t
> (redex-match? BoxyL any (term kar))
#t
> (redex-match? BoxyL any (term :))
#t

The any pattern can be useful for defining generic metafunctions, grammars, and judgments, or to partially specify contracts on Redex definitions. In large developments, I sometimes create a base language with generic operations on syntax, using any.

When in Racket, we use term to inject syntax into Redex. term acts like quasiquote, and even supports unquote, so we can use Racket to compute terms through templating. Most quoted s-expressions are also valid Redex terms, which can be useful if we want to move between s-expressions in Racket and terms in Redex.

Examples:

> (define eg3 (term (cdr ,(+ 2 3))))
> eg3
'(cdr 5)
> (redex-match? BoxyL e eg3)
#t
> (define eg4 '(cdr 5))
> (redex-match? BoxyL e eg4)
#t

Redex can also decompose a term by pattern matching.

Examples:

> (redex-match BoxyL (e_1 e_2) (term ((λ (x : Nat) x) 5)))
(list (match (list (bind 'e_1 '(λ (x : Nat) x)) (bind 'e_2 5))))
> (redex-match BoxyL (box e_1) eg1)
(list (match (list (bind 'e_1 5))))

The structure returned by redex-match is a little annoying to look at. Redex supports non-deterministic matching, so we get back the set of all possible ways to match. The elements of the set are matches, which contain lists of bindings of the pattern variable to the term.

I don’t often use redex-match directly for decomposing terms, because redex-let has a much better interface. But redex-let requires a deterministic single match.

Example:

> (redex-let BoxyL ([(e_1 e_2) (term ((λ (x : Nat) x) 5))])
(displayln (term e_1))
(displayln (term e_2)))
(λ (x : Nat) x)
5

2.4 Extending Syntax and Syntax with Holes (Contexts)

define-extended-language allows extending a base language with new nonterminals, or extending existing nonterminals from the base language with new productions. For example, I could extend expressions with booleans literals as follows.

Examples:

> (define-extended-language BoxyBoolL BoxyL
(e ::= .... boolean))
> (redex-match? BoxyL e (term #t))
#f
> (redex-match? BoxyL e (term (car #t)))
#f
> (redex-match? BoxyBoolL e (term #t))
#t
> (redex-match? BoxyBoolL e (term (car #t)))
#t

This defines a new language, BoxyBoolL, which extends BoxyL with an extra production in the nonterminal e. To extend a nonterminal, I use ...., which means "all the current productions of this nonterminal", followed by the new productions.

We can also add new nonterminals by extending a language.

Example:

> (define-extended-language BoxyEvalL BoxyL
(E ::= hole (E e) (v E) (cons v ... E e ...) (+ v ... E e ...) (car E) (cdr E)
(let ((box x) E) e)))

This defines a new language, BoxyEvalL, which extends BoxyL with a call-by-value evaluation context. Redex syntax also support contexts, that is, programs with a hole. I use this for evaluation and program contexts. In this language, we define the context E by listing the usual evaluation contexts for call-by-value. For operators with many arguments, we can easily specify left-to-right evaluation order using non-deterministic ellipses patterns.

After specifying a context, we can easily decompose a term into its evaluation context and its redex.

Examples:

> (redex-match BoxyEvalL (in-hole E v) (term (car (cons (+ 1 2) 2))))
(list
(match (list (bind 'E '(car (cons (+ hole 2) 2))) (bind 'v 1)))
(match (list (bind 'E '(car (cons (+ 1 hole) 2))) (bind 'v 2))))
> (redex-match BoxyEvalL (in-hole E e) (term (car (cons (+ 1 2) 2))))
(list
(match (list (bind 'E hole) (bind 'e '(car (cons (+ 1 2) 2)))))
(match (list (bind 'E '(car hole)) (bind 'e '(cons (+ 1 2) 2))))
(match (list (bind 'E '(car (cons (+ hole 2) 2))) (bind 'e 1)))
(match (list (bind 'E '(car (cons (+ 1 hole) 2))) (bind 'e 2)))
(match (list (bind 'E '(car (cons hole 2))) (bind 'e '(+ 1 2)))))
> (redex-let BoxyEvalL ([(in-hole E (+ 1 2)) (term (car (cons (+ 1 2) 2)))])
(displayln (term E))
(displayln (term (in-hole E 3))))
(car (cons #<hole> 2))
(car (cons 3 2))

2.5 Binding, Substitution, and α-equivalence

Probably the winning feature in Redex for me is automagic handling of binding. After 11 lines of code, binding is completely solved. We have capture-avoiding substitution and α-equivalence for free using named identifiers.

Examples:

> (default-language BoxyL)
> (alpha-equivalent? (term (λ (x : Nat) e))  (term (λ (x : Nat) e)))
#t
> (alpha-equivalent? (term (λ (x : Nat) x))  (term (λ (y : Nat) y)))
#t
> (alpha-equivalent? (term (λ (x : Nat) y))  (term (λ (y : Nat) y)))
#f
> (term (substitute (λ (y : Nat) (x e2)) x (+ y 5)))
'(λ (y«0» : Nat) ((+ y 5) e2))

In this tutorial, I give only the simplest possible use of binding forms: a single, lexical name. But Redex’s support for binding is extremely sophisticated. It can support n-ary lexical binding with shadowing, pattern-matching style binding, module-style binding. See binding forms in the Redex docs for many examples of complex binding support.

2.6 Random Generation of Syntax

Finally, we can generate terms from the grammar. This is helpful for random testing of meta-theory, and generating examples.

Examples:

> (define an-e1 (generate-term BoxyL e 10))
> (define an-e2 (generate-term BoxyL e 10))
> an-e1
'(cdr (car (λ (k : (□ Nat)) (i 0))))
> an-e2
'(box (f 0))

generate-term takes a language name, a generation specification, and a size. It can generate terms based on lots of Redex definitions, although I almost always generate terms from grammars and judgments.

2.7 Testing Syntax

Once we’re done querying, we can move to testing. Redex features a unit testing library. Tests are great for detecting breakages as you start tinkering with your model, and can also report more information than the mostly-boolean query functions.

Redex, unfortunately, doesn’t have a test function for patterns. Instead, we have to use test-predicate and redex-match?.

Examples:

> (default-language BoxyL)
> (test-predicate (redex-match? BoxyL e) (term (car (+ 1 2))))
> (test-predicate (redex-match? BoxyL x) (term kar))
> (test-predicate (redex-match? BoxyL x) (term car))
FAILED :46.0
#<procedure:eval:46:0> does not hold for: 'car
> (test-results)
1 test failed (out of 3 total).

This pattern is hard to abstract over since redex-match? expects a language identifier.

For testing equality of terms, we can use test-equal.

Examples:

> (test-equal (term (λ (x : Nat) y)) (term (λ (y : Nat) y))
              #:equiv alpha-equivalent?)
FAILED :48.0
  actual: '(λ (x : Nat) y)
expected: '(λ (y : Nat) y)
> (default-equiv alpha-equivalent?)
> (test-equal (term (λ (x : Nat) x)) (term (λ (y : Nat) y)))
> (test-equal (term (substitute (λ (y : Nat) (x e2)) x (+ y 5)))
              (term (λ (z : Nat) ((+ y 5) e2))))
> (test-results)
1 test failed (out of 3 total).

test-equal takes two expressions, and optionally, an equivalence predicate. The equivalence defaults to the value of the parameter default-equiv.

We can use redex-check to do random testing of syntactic properties. For example, all values ought to be expressions.

Example:

> (redex-check
   BoxyL
   v
   (redex-match? BoxyL e (term v))
   #:attempts 1000)
redex-check: no counterexamples in 1000 attempts

When generating syntax, redex-check takes a language identifier, a pattern, a predicate, and (optionally), the number of attempts.

2.8 Caveat: Syntax with Side Conditions

The Redex pattern language is very expressive, and one quite nice feature that is hard to find in the documentation is using side-conditions in grammar. I use this occasionally, particularly when modeling machine languages. For example, below I define a syntax with 64-bit integers.

Examples:

> (define-language Int64L
(int64 ::= (side-condition integer_1
(<= (expt -2 31) (term integer_1) (sub1 (expt 2 31))))))
> (redex-match? Int64L int64 4)
#t
> (redex-match? Int64L int64 -5)
#t
> (redex-match? Int64L int64 (expt 2 31))
#f

The side-condition takes a pattern and a Racket predicate which can refer to the pattern variables. You can use a side-condition pattern anywhere in a pattern.

Unfortunately, using side-conditions in patterns does not work with redex-check. You can still use generate-term, though.

2.9 Pitfall: All Symbols are Variables

In a couple of the examples above, I rely on an anti-pattern. I pun between a real expression and an expression schema, by punning between variables and meta-variables. For example, consider the Redex term ((λ (x : Nat) e_body) e_arg). This is only valid because I used variable-not-otherwise-mentioned, which makes all symbols valid variables. In BoxyL, this expression is valid, but in an unintuitive way. The subterms e_body and e_arg are actually variables, although we are pretending they are meta-variables.

Examples:

> (redex-match? BoxyL x (term e_body))
#t
> (redex-match? BoxyL x (term e_arg))
#t

This can let us treat an expression as an expression schema. It is a very tempting way to create examples. Instead of being very precise about the entire term, you can create readable meta-examples only specifying parts of the term and pretending this applies to all terms of this pattern. But relying on this behavior can really cause problems if you’re not careful.

For example, consider the very similar term ((λ (x : A) e_body) e_arg). Ah, surely this meta-example represents any valid application... but it isn’t even a valid expression.

Examples:

> (redex-match? BoxyL e (term ((λ (x : A) e_body) e_arg)))
#f
> (redex-match? BoxyL A (term A))
#f

The problem is that A, isn’t a valid type, and we did not allow types to have variables in the grammar.

Worse, when a term doesn’t match a language nonterminal, metafunctions such as substitute can exhibit undefined behavior.

Examples:

> (term (substitute (λ (y : A) (x e2)) x (+ y 5)))
'(λ (y : A) ((+ y 5) e2))
> (alpha-equivalent? (term (λ (x : A) x)) (term (λ (y : A) y)))
#f

One should avoid this pattern, and be aware of it because you can do it accidently. For example, if we use the wrong letter or wrong case, we can easily create an invalid term, or a valid term that isn’t what we expected.

Examples:

> (redex-match? BoxyL e (term (substitute (λ (y : Nat) (x e2)) x (+ y 5))))
#t
> (redex-match? BoxyL e (term (λ (y : nat) (x e2))))
#f
> (redex-match? BoxyL e (term (kar (cons e_1 e_2))))
#t

The root of the problem is that we used variable-not-otherwise-mentioned and didn’t tag variables, so any symbol is a valid expression. This is hard to avoid in Redex unless you exclusively use variable-prefix. Be careful when using the variable specifications that give you infinite unstructured variables. It’s handy, but like in any scripting language, that usefulness comes at a cost. It will cause you problems. It has caused me problems. It bit me twice while writing this tutorial.

Even when you manage to avoid this, term simply quotes any symbol, so typos can result in valid terms, although the term may not match a nonterminal. If you use contracts, which I strongly encourage and will use in the rest of this tutorial, this is less of a problem since the invalid quoted term will hopefully not match a nonterminal.

2.10 Pitfall: Subscripts and Unicode

Redex makes it very easy to mix unicode into your formal syntax. This is handy, since the code looks closer to the paper presentation. Unfortunatley, it can cause problems if you’re using unicode subscripts or TeX input mode in emacs.

The underscore, _, has a special meaning when attached to a nonterminal in a Redex pattern: it creates a distinct pattern variable matching the preceeding nonterminal. You might be tempted to use a unicode subscript instead, but that doesn’t work.

Examples:

> (redex-match BoxyL (e₁ e₂) (term ((λ (x : Nat) x) 5)))
#f
> (redex-match BoxyL (e_1 e_2) (term ((λ (x : Nat) x) 5)))
(list (match (list (bind 'e_1 '(λ (x : Nat) x)) (bind 'e_2 5))))

This can be a particular problem if you’re using TeX input mode in emacs. When in TeX input mode, typing _ causes the input mode to being typing a subscript unicode symbol, creating a literal symbol rather than a pattern variable. This result will be a mysterious failure.

I’m not sure if unicode subscripts should be treated the same by Redex or not, but for now, I recommend never ever using unicode subscripts in Redex. I usually use TeX input mode, and have a hotkey for turning it on and off, so I can type underscores.

← Preface—Why RedexReduction and Evaluation in Redex →

contents ← prev up next →

1	Preface—Why Redex
2	Syntax in Redex
3	Reduction and Evaluation in Redex
4	Judgments in Redex

2.1	Syntax TLDR
2.2	Defining Syntax
2.3	Pattern Matching
2.4	Extending Syntax and Syntax with Holes (Contexts)
2.5	Binding, Substitution, and α-equivalence
2.6	Random Generation of Syntax
2.7	Testing Syntax
2.8	Caveat: Syntax with Side Conditions
2.9	Pitfall: All Symbols are Variables
2.10	Pitfall: Subscripts and Unicode