8.16
Syntax Micros🔗
An implementation of micros on top of macros.
Macros syntax? into syntax?, both representing arbitrary source terms.
Roughly, their type is Source -> Source.
They are very flexible and can be useful for language extenisble and shallow-embedding
DSL implementation.
Unfortunately, as their output is arbitrary source terms (possibly
including more macro definitions), it is difficult to stop macro expansion, for
example because you want to stop at a fixed intermediate representation (IR).
Micros instead expand to a well-defined IR (Krishnamurthi 2001), not necessarily represented as syntax objects.
That is, micros have type Source -> IR.
Micro expansion is easy to stop, and simplifies writing arbitrary processing
over the IR outside of the macro expander.
This library implements micros as a pattern over macros.
Micro languages can be embedded in Racket by first elaborating from
syntax to an IR, then preprocessing, then continuing elaboration into Racket.
This is similar to syntax-spec-v3 (Ballantyne et al. 2024);
however, syntax-spec-v3 provides a fixed core language
definition, represented as syntax objects.
Micros provide an extensible core language definition, whose
representation is arbitrary and generated by the micro definition.
(syntax-spec-v3 provides lot of other features as well, such as macros
scoped to a particular DSLs.)
1 Tutorial: STLC🔗
Using micros, we can easily implement hybrid embedding of DSLs into Racket.
Compared to pure macro embeddings, this allows us to use traditional AST
representation instead of syntax objects, and use traditional traversal
techniques, instead of relying on the macro expander.
In this tutorial we’ll implement an embedding of the simply-typed lambda calculus.
Our micros expand to a struct representation, giving us a traditional efficient AST representation.
We can then easily implement the traditional bidirectional typechecker,
followed by a compilation back into Racket as syntax objects.
This approach requires fewer new abstractions compared to the macro embedding
approach of turnstile (Chang et al. 2017), although
with less extensibility, although some extensibility could be achieved using
define-generics to implement the type checker.
We’ll start by defining two micro languages: one for terms, and one for types.
Conceptually, we need one micro language for each non-terminal.
Each micro define an extensible parser or elaborator from the source language into the core IR.
Examples:
The types are simple, so we can use define-simple-micro, which
generates the micro and its struct representation.
We’ll define two types: natural numbers and functions.
Examples:
To implement equality, we could pass an implementation of
gen:equal+hash, or we can define a type equality function manually.
We’ll do the latter for simplicity.
We must make the definition at phase 1, since the IR is a compile time IR
happening in between macro expansion.
Since we know the micro IR representation, we simply pattern match on the
structs.
The micros will take care of fully expanding to this representation; we will
almost never need to deal with syntax.
Next we can start defining terms.
We’ll embed Racket naturals using a define-embedding-micro, and define
functions and applications.
Natural numbers are embedded directly from Racket.
The micro matches against the expanded form of a Racket natural number (quote n), where n is a literal.
Variables will never be written directly, so we bind their
micro to an error.
Instead, Racket identifiers will expand into the variable syntax node directly
from the binding form for the variable.
We see that in the lambda definition: we pass an environment to the expander
that will expand the bound variable into the IR representation of a variable.
We also see that we use different expanders to expand the type and term languages separately.
Examples:
|
| > (define-micro stlc-lambda stlc-terms | | (syntax-parser | | #:datum-literals (:) | | [(_ (x:id : A) e) | | (define A- (expand-stlc-types #'A)) | | (stlc-lambda | | #'x | | (expand-stlc-types #'A) | | (expand-stlc-terms #'e (list (cons #'x (stlc-var #'x A-)))))]) | | #:struct (x type body)) |
|
| > (define-micro stlc-app stlc-terms | | (syntax-parser | | [(_ e1 e2) | | (stlc-app (expand-stlc-terms #'e1) (expand-stlc-terms #'e2))]) | | #:struct (rator rand)) |
|
Finally, we add a type annotation form.
Example:
| > (define-micro :: stlc-terms | | (syntax-parser | | [(_ e A) | | (:: (expand-stlc-terms #'e) (expand-stlc-types #'A))]) | | #:struct (term type)) |
|
Now that the entire language is defined, we can implement a standard
bi-directional type checker by matching against the entire syntax tree.
We could have also decided to implement the type checker using
define-generics, so the type checker itself is extensible.
We define two functions: type-check and type-synth.
Examples:
| > (define-for-syntax (type-check e A) | | (match* (e A) | | [((stlc-lambda x type body) | | (-> A B)) | | (and (type-equal? A type) | | (type-check body B))] | | [(_ _) (type-equal? A (type-synth e))])) |
|
| > (define-for-syntax (type-synth e) | | (match e | | [(stlc-nat _) (Nat)] | | [(stlc-var _ A) A] | | [(stlc-app rator rand) | | (match (type-synth rator) | | [(-> A B) | | (unless (type-check rand A) | | (error "type error")) | | B])] | | [(:: e A) | | (unless (type-check e A) | | (error "type error")) | | A])) |
|
Finally, if we wish to embed these terms in Racket, we need to define a
compiler from stlc-terms to Racket, as syntax objects, and an embedding macros
to act as the interface.
Examples:
| > (define-for-syntax (stlc-compile e) | | (match e | | [(stlc-var e _) e] | | [(stlc-nat e) #`'#,e] | | [(stlc-lambda x A body) | | #`(lambda (#,x) #,(stlc-compile body))] | | [(stlc-app rator rand) | | #`(#,(stlc-compile rator) #,(stlc-compile rand))] | | [(:: e _) | | (stlc-compile e)])) |
|
|
| > (embed-stlc | | (stlc-app (:: (stlc-lambda (x : Nat) x) (-> Nat Nat)) 5)) |
|
(stlc-app (:: (stlc-lambda (x : Nat) x) (-> Nat Nat)) 5) has type #<Nat> |
5 |
We could continue this exercise, wrapping the result into a hash-lang.
See examples/stlc-hashlang.rkt in the repository for an extended example.
2 Reference🔗
Defines three phase-1 bindings:
name-id as a micro language. This is also referred to as the vocabulary (Krishnamurthi 2001).
name-id-term, an extensible syntax class for recognizing name-id terms.
expand-name-id, a function from arbitrary syntax to name-id terms that fully expands all macros and micros.
These functions will raise an error until micros are defined for name-id.
Examples:
| > (require syntax/micros (for-syntax racket/base syntax/parse)) |
| > (define-micro-language foo-lang) |
|
eval:3:0: foo-lang-term: Did not match expanded term |
in: (#%app void) |
|
|
|
| > (embed-foo (my-foo 5)) |
'(nat 5) |
(define-micro kw-id lang-id micro-def maybe-struct-spec)
|
| |
| maybe-struct-spec | | = | | | | | | | | | struct-spec |
|
| |
|
Defines a
micro in the transformer environment for the
micro
language lang-id.
The transformer
micro-def should return an IR representation of the
term in
lang-id, but the representation is arbitrary.
For convenience, a struct definition may be attached and will be defined in the
scope of the micro definition, as it may be common to use structs to represent
IR terms.
If specified, struct-spec should be syntax that can be passed to
struct, starting with the option super-id.
The struct will also be named kw-id, which will be defined at phase 1.
Examples:
| > (require syntax/micros (for-syntax racket/base syntax/parse)) |
| > (define-micro-language arith-lang) |
| > (define-micro + arith-lang | | (syntax-parser | | [(_ e1 e2) | | (make-+ (expand-arith-lang #'e1) (expand-arith-lang #'e2))]) | | #:struct (e1 e2) #:transparent #:extra-constructor-name make-+) |
|
| > (define-micro nat arith-lang | | (syntax-parser | | [(_ e:exact-nonnegative-integer) | | (nat (syntax-e #'e))]) | | #:struct (e) #:transparent) |
|
|
#(struct:+ #(struct:nat 5) #(struct:nat 6)) | #(struct:nat 5) |
|
Similar to
define-micro, but defines an embedding of some arbitrary
source term that has no new corresponding keyword associated with it.
This is useful for embedding raw
#%datum, for example.
The
micro is called with an arbitrary syntax object that did not match
any other identifier in the
micro language, and should raise a syntax
error if the syntax object is not the expected form for this embedding micros.
(This is the default behaviour of a syntax transformer with an incomplete
pattern match, so embedding micro definitions do not need to handle unexpected
syntax explicitly.)
Examples:
| > (require syntax/micros (for-syntax racket/base syntax/parse)) |
| > (define-micro-language arith-lang) |
| > (define-micro + arith-lang | | (syntax-parser | | [(_ e1 e2) | | (make-+ (expand-arith-lang #'e1) (expand-arith-lang #'e2))]) | | #:struct (e1 e2) #:transparent #:extra-constructor-name make-+) |
|
| > (define-embedding-micro nat arith-lang | | (syntax-parser | | #:literals (quote) | | ; Matches a fully expanded Racket natural | | ['e:exact-nonnegative-integer | | (make-nat (syntax-e #'e))]) | | #:struct (e) #:transparent #:extra-constructor-name make-nat) |
|
|
#(struct:+ #(struct:nat 5) #(struct:nat 6)) |
(define-simple-micro (kw-id field-id ...) lang-id maybe-struct-options ...)
|
| |
| maybe-struct-options | | = | | | | | | | | | struct-options |
|
Defines a
micro that expands into a struct of the same form and name.
Equivalent to
| (define-micro kw-id lang-id |
| (syntax-parser |
| [(_ field-id ...) |
| (kw-id field ...)]) |
| #:struct (field ...) maybe-struct-options ...) |
A phase 1 function that is probably not accessed directly, but is wrapped by the definition provided by
define-micro-language.
Expands stx fully to some micro language.
The returns syntax object is arbitrary, but has the micro IR attached using syntax/mule.
binds should specify any free identifiers in scope, and what to expand them into.
Their expansion should be a IR in the same micro language.
Bibliography🔗
Michael Ballantyne, Mitch Gamburg, and Jason Hemann. Compiled, Extensible, Multi-language DSLs (Functional Pearl). Proceedings of the ACM on Programming Languages 8(ICFP), 2024. doi:10.1145/3674627 |
Stephen Chang, Alex Knauth, and Ben Greenman. Type Systems As Macros. In Proc. popl, 2017. doi:10.1145/3093333.3009886 |
Shriram Krishnamurthi. Linguistic reuse. PhD dissertation, Rice University, 2001. https://hdl.handle.net/1911/17993 |