Syntax Micros

8.16

Syntax Micros🔗

wilbowma

An implementation of micros on top of macros.

Macros syntax? into syntax?, both representing arbitrary source terms. Roughly, their type is Source -> Source. They are very flexible and can be useful for language extenisble and shallow-embedding DSL implementation. Unfortunately, as their output is arbitrary source terms (possibly including more macro definitions), it is difficult to stop macro expansion, for example because you want to stop at a fixed intermediate representation (IR).

Micros instead expand to a well-defined IR (Krishnamurthi 2001), not necessarily represented as syntax objects. That is, micros have type Source -> IR. Micro expansion is easy to stop, and simplifies writing arbitrary processing over the IR outside of the macro expander.

This library implements micros as a pattern over macros. Micro languages can be embedded in Racket by first elaborating from syntax to an IR, then preprocessing, then continuing elaboration into Racket. This is similar to syntax-spec-v3 (Ballantyne et al. 2024); however, syntax-spec-v3 provides a fixed core language definition, represented as syntax objects. Micros provide an extensible core language definition, whose representation is arbitrary and generated by the micro definition. (syntax-spec-v3 provides lot of other features as well, such as macros scoped to a particular DSLs.)

1 Tutorial: STLC🔗

Using micros, we can easily implement hybrid embedding of DSLs into Racket. Compared to pure macro embeddings, this allows us to use traditional AST representation instead of syntax objects, and use traditional traversal techniques, instead of relying on the macro expander.

In this tutorial we’ll implement an embedding of the simply-typed lambda calculus. Our micros expand to a struct representation, giving us a traditional efficient AST representation. We can then easily implement the traditional bidirectional typechecker, followed by a compilation back into Racket as syntax objects. This approach requires fewer new abstractions compared to the macro embedding approach of turnstile (Chang et al. 2017), although with less extensibility, although some extensibility could be achieved using define-generics to implement the type checker.

We’ll start by defining two micro languages: one for terms, and one for types. Conceptually, we need one micro language for each non-terminal. Each micro define an extensible parser or elaborator from the source language into the core IR.

Examples:

> (require syntax/micros (for-syntax racket syntax/parse))
> (define-micro-language stlc-types)
> (define-micro-language stlc-terms)

The types are simple, so we can use define-simple-micro, which generates the micro and its struct representation. We’ll define two types: natural numbers and functions.

Examples:

> (define-simple-micro Nat stlc-types)
> (define-simple-micro (-> A B) stlc-types)

To implement equality, we could pass an implementation of gen:equal+hash, or we can define a type equality function manually. We’ll do the latter for simplicity. We must make the definition at phase 1, since the IR is a compile time IR happening in between macro expansion.

Examples:

> (require (for-syntax racket/match))
> (define-for-syntax (type-equal? A B)
   (match* (A B)
    [((Nat) (Nat))
     #t]
    [((-> A1 B2) (-> A2 B2))
     (and (type-equal? A1 A2) (type-equal? B1 B2))]
    [(_ _) #f]))

Since we know the micro IR representation, we simply pattern match on the structs. The micros will take care of fully expanding to this representation; we will almost never need to deal with syntax.

Next we can start defining terms. We’ll embed Racket naturals using a define-embedding-micro, and define functions and applications.

Natural numbers are embedded directly from Racket. The micro matches against the expanded form of a Racket natural number (quote n), where n is a literal.

Example:

> (define-embedding-micro stlc-nat stlc-terms
   (syntax-parser
    ['n:exact-nonnegative-integer
     (stlc-nat (syntax-e #'n))])
   #:struct (n))

Variables will never be written directly, so we bind their micro to an error. Instead, Racket identifiers will expand into the variable syntax node directly from the binding form for the variable. We see that in the lambda definition: we pass an environment to the expander that will expand the bound variable into the IR representation of a variable. We also see that we use different expanders to expand the type and term languages separately.

Examples:

> (define-micro stlc-var stlc-terms
   (lambda (stx) (error "Impossible"))
   #:struct (name type))
> (define-micro stlc-lambda stlc-terms
   (syntax-parser
    #:datum-literals (:)
    [(_ (x:id : A) e)
     (define A- (expand-stlc-types #'A))
     (stlc-lambda
      #'x
      (expand-stlc-types #'A)
      (expand-stlc-terms #'e (list (cons #'x (stlc-var #'x A-)))))])
    #:struct (x type body))
> (define-micro stlc-app stlc-terms
   (syntax-parser
    [(_ e1 e2)
     (stlc-app (expand-stlc-terms #'e1) (expand-stlc-terms #'e2))])
    #:struct (rator rand))

Finally, we add a type annotation form.

Example:

> (define-micro :: stlc-terms
   (syntax-parser
    [(_ e A)
     (:: (expand-stlc-terms #'e) (expand-stlc-types #'A))])
   #:struct (term type))

Now that the entire language is defined, we can implement a standard bi-directional type checker by matching against the entire syntax tree. We could have also decided to implement the type checker using define-generics, so the type checker itself is extensible. We define two functions: type-check and type-synth.

Examples:

> (define-for-syntax (type-check e A)
   (match* (e A)
    [((stlc-lambda x type body)
      (-> A B))
     (and (type-equal? A type)
          (type-check body B))]
    [(_ _) (type-equal? A (type-synth e))]))
> (define-for-syntax (type-synth e)
   (match e
    [(stlc-nat _) (Nat)]
    [(stlc-var _ A) A]
    [(stlc-app rator rand)
     (match (type-synth rator)
      [(-> A B)
       (unless (type-check rand A)
        (error "type error"))
       B])]
    [(:: e A)
     (unless (type-check e A)
      (error "type error"))
     A]))

Finally, if we wish to embed these terms in Racket, we need to define a compiler from stlc-terms to Racket, as syntax objects, and an embedding macros to act as the interface.

Examples:

> (define-for-syntax (stlc-compile e)
   (match e
    [(stlc-var e _) e]
    [(stlc-nat e) #`'#,e]
    [(stlc-lambda x A body)
     #`(lambda (#,x) #,(stlc-compile body))]
    [(stlc-app rator rand)
     #`(#,(stlc-compile rator) #,(stlc-compile rand))]
    [(:: e _)
     (stlc-compile e)]))
> (define-syntax embed-stlc
   (syntax-parser
    [(_ e)
     (define e- (expand-stlc-terms #'e))
     (define A (type-synth e-))
     (printf "~a has type ~a~n" (syntax->datum #'e) A)
     (stlc-compile e-)]))
> (embed-stlc
   (stlc-app (:: (stlc-lambda (x : Nat) x) (-> Nat Nat)) 5))
(stlc-app (:: (stlc-lambda (x : Nat) x) (-> Nat Nat)) 5) has type #<Nat>
5

We could continue this exercise, wrapping the result into a hash-lang. See examples/stlc-hashlang.rkt in the repository for an extended example.

2 Reference🔗

(require syntax/micros)

package: syntax-micros-lib

syntax
(define-micro-language name-id)

name : identifier?

Defines three phase-1 bindings:

name-id as a micro language. This is also referred to as the vocabulary (Krishnamurthi 2001).
name-id-term, an extensible syntax class for recognizing name-id terms.
expand-name-id, a function from arbitrary syntax to name-id terms that fully expands all macros and micros.

These functions will raise an error until micros are defined for name-id.

Examples:

> (require syntax/micros (for-syntax racket/base syntax/parse))
> (define-micro-language foo-lang)
> (begin-for-syntax
   (displayln (expand-foo-lang #'(void))))
eval:3:0: foo-lang-term: Did not match expanded term
  in: (#%app void)
> (define-micro my-foo foo-lang
   (syntax-parser
    [(_ n:nat) `(nat ,(syntax-e #'n))]))
> (define-for-syntax (compile-foo e)
   #`'#,(expand-foo-lang e))
> (define-syntax embed-foo
   (syntax-parser
    [(_ e) (compile-foo #'e)]))
> (embed-foo (my-foo 5))
'(nat 5)

syntax
(define-micro kw-id lang-id micro-def maybe-struct-spec)

maybe-struct-spec =
| struct-spec

   kw-id : identifier?
   lang-id : identifier?
   micro-def : (-> syntax? IR)

Defines a micro in the transformer environment for the micro language lang-id. The transformer micro-def should return an IR representation of the term in lang-id, but the representation is arbitrary.

For convenience, a struct definition may be attached and will be defined in the scope of the micro definition, as it may be common to use structs to represent IR terms. If specified, struct-spec should be syntax that can be passed to struct, starting with the option super-id. The struct will also be named kw-id, which will be defined at phase 1.

Examples:

> (require syntax/micros (for-syntax racket/base syntax/parse))
> (define-micro-language arith-lang)
> (define-micro + arith-lang
   (syntax-parser
    [(_ e1 e2)
     (make-+ (expand-arith-lang #'e1) (expand-arith-lang #'e2))])
   #:struct (e1 e2) #:transparent #:extra-constructor-name make-+)
> (define-micro nat arith-lang
   (syntax-parser
    [(_ e:exact-nonnegative-integer)
     (nat (syntax-e #'e))])
   #:struct (e) #:transparent)
> (begin-for-syntax
   (displayln (expand-arith-lang #'(+ (nat 5) (nat 6))))
   (displayln (+-e1 (expand-arith-lang #'(+ (nat 5) (nat 6))))))
#(struct:+ #(struct:nat 5) #(struct:nat 6))
#(struct:nat 5)

syntax
(define-embedding-micro kw-id lang-id micro-def maybe-struct-spec)

maybe-struct-spec =
| struct-spec

   kw-id : identifier?
   lang-id : identifier?
   micro-def : (-> syntax? IR)

Similar to define-micro, but defines an embedding of some arbitrary source term that has no new corresponding keyword associated with it. This is useful for embedding raw #%datum, for example. The micro is called with an arbitrary syntax object that did not match any other identifier in the micro language, and should raise a syntax error if the syntax object is not the expected form for this embedding micros. (This is the default behaviour of a syntax transformer with an incomplete pattern match, so embedding micro definitions do not need to handle unexpected syntax explicitly.)

Examples:

> (require syntax/micros (for-syntax racket/base syntax/parse))
> (define-micro-language arith-lang)
> (define-micro + arith-lang
   (syntax-parser
    [(_ e1 e2)
     (make-+ (expand-arith-lang #'e1) (expand-arith-lang #'e2))])
   #:struct (e1 e2) #:transparent #:extra-constructor-name make-+)
> (define-embedding-micro nat arith-lang
   (syntax-parser
    #:literals (quote)
    ; Matches a fully expanded Racket natural
    ['e:exact-nonnegative-integer
     (make-nat (syntax-e #'e))])
   #:struct (e) #:transparent #:extra-constructor-name make-nat)
> (begin-for-syntax
   (displayln (expand-arith-lang #'(+ 5 6))))
#(struct:+ #(struct:nat 5) #(struct:nat 6))

syntax
(define-simple-micro (kw-id field-id ...) lang-id maybe-struct-options ...)

maybe-struct-options =
| struct-options

Defines a micro that expands into a struct of the same form and name. Equivalent to

(define-micro kw-id lang-id

(syntax-parser

[(_ field-id ...)

(kw-id field ...)])

#:struct (field ...) maybe-struct-options ...)

procedure
(expand-micro stx binds) → syntax?
stx : syntax?
binds : (list/c (cons/c identifier? any/c))

A phase 1 function that is probably not accessed directly, but is wrapped by the definition provided by define-micro-language.

Expands stx fully to some micro language. The returns syntax object is arbitrary, but has the micro IR attached using syntax/mule. binds should specify any free identifiers in scope, and what to expand them into. Their expansion should be a IR in the same micro language.

Bibliography🔗

Michael Ballantyne, Mitch Gamburg, and Jason Hemann. Compiled, Extensible, Multi-language DSLs (Functional Pearl). Proceedings of the ACM on Programming Languages 8(ICFP), 2024. doi:10.1145/3674627

Stephen Chang, Alex Knauth, and Ben Greenman. Type Systems As Macros. In Proc. popl, 2017. doi:10.1145/3093333.3009886

Shriram Krishnamurthi. Linguistic reuse. PhD dissertation, Rice University, 2001. https://hdl.handle.net/1911/17993