This is an interactive demonstration of the ICFP 2025 paper Multi-Stage Programming with Splice Variables by Tsung-Ju Chiang and Ningning Xie.
What is multi-stage programming? It's a technique where programs generate other programs. Instead of writing generic code that handles all cases at runtime, you generate specialized, optimized code tailored to specific situations.
For example, instead of a power function that uses a loop, you could generate specialized code like
x * x * x * x * x
directly. This eliminates runtime overhead and creates highly optimized code.
Our approach introduces splice variables — a new way to make code generation predictable and safe. This technique provides precise control over the generation process and seamlessly scales to advanced features like code pattern matching and rewriting. The type system automatically tracks variable dependencies, ensuring that generated code is always well-formed, properly scoped, and type-checks correctly.
The code editor below lets you experiment with the language described in the paper. Try the examples as you read through the tutorial, or experiment with your own code! For more details on the type system's inner workings and implementation, please refer to the paper.
The language syntax resembles OCaml or (to a lesser extent) Haskell.
It includes basic types like int
and bool
,
arithmetic operators (+
, -
, *
, /
),
comparison (==
), conditionals (if then else
),
functions, and let bindings.
Type annotations are written as e : type
for expression e
.
The first key idea is quotations. When you write <1 + 2>
,
instead of computing 3
, this creates a piece of code that represents the expression 1 + 2
.
Try it:
The angle brackets < >
tell the system "don't evaluate this yet, just treat it as data."
Think of it like putting code in quotes to talk about the code itself rather than running it.
Quotations can even be nested inside other quotations to create code that generates code:
Now comes the interesting part: how can quoted code be used inside other quoted code? Consider having some code stored in a quotation, and wanting to use it as part of a larger piece of code.
The let$
construct "unwraps" a piece of quoted code and allows its use inside other quotations:
What happened here? The quoted code <1>
was unwrapped into a variable called x
,
then x
was used twice inside a new quotation. The result is code that adds 1 + 1
.
The variable x
is called a splice variable because it "splices" the unwrapped code into the new quotation.
More interestingly, code that contains variables can also be unwrapped by declaring variable dependencies:
Here, s
is a splice variable that depends on an integer variable x
.
The square brackets [x : int]
say "this code uses a variable called x
of type int
."
When we use s
, it automatically captures the variable x
from the surrounding context.
Instead of capturing the same variable, you can also explicitly provide a value for the variable:
Here, we tell the system to substitute x
with 42
when generating the code.
The result is code that adds 42 + 1
.
Why do we need this special let$
syntax instead of just using regular variables?
There are two fundamental reasons why splice variables are necessary for safe code generation.
The first reason is stage tracking: In our system, each variable has a stage level that determines
how deeply nested in quotations it can be used. Regular variables exist at stage 0 (the current evaluation context),
but let$
creates splice variables at stage 1, which means they can be used inside one level of quotations
to represent code fragments.
For example, with nested quotations like <<x>>
(code that generates code that uses x
),
you need variables at different stages: stage 0 for the outermost context, stage 1 for inside the first quotation,
and stage 2 for inside the nested quotation. This stage tracking ensures that variables are only used in appropriate contexts.
Let's see this in action:
let$ s [x@1 : bool; y@2 : int] = <if x then <y + 1> else <y + 2>> in <fun x : int -> s[y=42]>
In this example, a splice variable s
is created that depends on a boolean variable x
at stage 1
and an integer variable y
at stage 2. The code inside s
uses these variables to generate
code that conditionally generates either y + 1
or y + 2
.
When s[y=42]
is used inside a function, it substitutes y
with 42
,
generating code that evaluates to either 42 + 1
or 42 + 2
based on the value of x
.
This demonstrates how stage tracking works in practice: x
is used inside the first quotation,
y
is used inside the nested quotation, and s
brings code into quotations.
The type system ensures each variable is only used at the quotation level it belongs to.
The second reason is dependency tracking: When you create a piece of code like <x + 1>
, it contains a reference to variable x
.
This code only makes sense if there's actually a variable x
available when the code runs.
The challenge is keeping track of these dependencies as code gets moved around, put inside functions, or extracted from functions.
Consider what happens when code with variables is used in different contexts:
Here, s
depends on x
, and we use it inside a function that has a parameter named x
.
The system automatically captures the function's x
parameter to satisfy the dependency,
so we get a function that adds 1 to its input.
But here, the function parameter is named y
, not x
. Since s
needs a variable x
that doesn't exist in this context, the system reports an error. This prevents you from accidentally
generating code with undefined variables.
To fix this, you explicitly provide the missing dependency:
You can provide any valid expression for the dependency:
This dependency tracking is also relevant when pattern matching on code.
(We'll see how code pattern matching works later in the tutorial.)
If you extract the body from <fun x -> x + 1>
, that extracted code
now depends on the variable x
from the function. Our type system
automatically tracks these dependencies, ensuring all generated code has correct scoping.
Our type system automatically tracks both stage levels and variable dependencies, ensuring that all generated code is well-formed, properly scoped, and type-checks correctly. This makes code generation predictable and safe.
Let's see how these ideas work together in a practical example. We'll build a function that creates optimized power functions — instead of using loops, we'll generate code that does direct multiplication.
let rec power : int code -> int -> int code = fun x -> fun n -> if n == 0 then <1> else let$ s1 = x in let$ s2 = power x (n - 1) in <s1 * s2> in power <2> 3
This function takes some quoted code x
and a number n
, then builds code that
multiplies x
by itself n
times. Try it with power <2> 3
to see how it generates 2 * 2 * 2 * 1
.
What if we want to create a power function that can work with different inputs? This is where dependency variables come in. They let us create code templates with placeholders.
let rec power : int code -> int -> int code = fun x -> fun n -> if n == 0 then <1> else let$ s1 = x in let$ s2 = power x (n - 1) in <s1 * s2> in let$ power5 [x : int] = power <x> 5 in <fun x : int -> power5>
Here, power5
is a splice variable that depends on x
. The square brackets [x : int]
say "this code fragment uses a variable called x
that will be provided later." When we use power5
inside the function, it automatically gets the x
from the function parameter.
The result is a function that directly multiplies its input five times — no loops, no recursion at runtime!
Our language also supports pattern matching on code, which allows you to inspect and transform code structures. This is useful for optimizations and code analysis.
let swap_add : int code -> int code = fun e -> match$ e with | (x + y) -> <y + x> | _ -> e in swap_add <1 + 2>
This function takes a piece of code and, if it's an addition, swaps the operands. The pattern (x + y)
matches any addition where x
and y
become pattern variables that can be used in the result.
Pattern matching becomes more tricky when dealing with code that contains variables. This is where the variable dependency tracking we discussed earlier becomes crucial. Our system automatically handles the binding structure:
Here, body
captures the function body x + 1
and remembers that it uses a variable x
.
We can then substitute x
with 42
to get 42 + 1
.
The system knows that body
depends on x
and handles the substitution correctly.
Code generation often produces unoptimized expressions with redundant operations.
For example, when generating mathematical code, you might end up with expressions like
x * 0
or 1 * y
that could be simplified.
Code rewriting lets you automatically clean up these patterns.
The rewrite
operator applies transformation rules to simplify generated code:
<42 + (0 * 5)> rewrite (0 * z) -> <0> rewrite (z + 0) -> <z>
This applies two simplification rules to the expression 42 + (0 * 5)
:
first it replaces 0 * 5
with 0
, then it replaces 42 + 0
with 42
.
Each rewrite rule says "whenever you see this pattern anywhere in the code, replace it with that expression."
The key difference from match$
is that rewrite
searches through the entire
code structure and applies transformations wherever patterns match, not just at the top level.
This makes it perfect for cleanup operations that need to work throughout a complex expression.
Again, we use splice variables to handle code patterns that contain variable bindings. The system automatically tracks variable dependencies, so you can write transformations that manipulate binding structures safely:
<(fun y : int -> y * 2) ((fun x : int -> x + 1) 5)> rewrite ((fun arg : int -> body : int) expr) -> <body [arg=expr]>
This rewrite rule performs beta reduction — it finds function applications and substitutes
the argument into the function body. The pattern (fun arg -> body) expr
matches
any function being applied to an argument. The system knows that body
depends on
the variable arg
, so when we write body [arg=expr]
, it correctly
substitutes all occurrences of arg
in the body with the provided expression.
This kind of transformation would be error-prone to write manually, but our dependency tracking system handles all the variable scoping automatically, ensuring the result is always well-formed.
Most programming languages try to prevent accidental variable capture — when a macro or function accidentally uses a variable from the surrounding context, or defines a variable that conflicts with the context where the macro is expanded. This can lead to bugs that are hard to track down. This property is called hygiene, and it's generally a good thing. But sometimes you actually want to define a variable that can be used within the arguments given to a function.
Our language supports unhygienic functions — functions whose arguments can use variables that are not defined in the current scope but are provided by the function itself. This allows you to write code generation patterns that intentionally introduce or capture variables.
The simplest example is a function that expects its argument to use a specific variable:
let with_x_42 : int code [x : int] -> int code = fun c -> c[x=42] in with_x_42 <x + 1>
Here, with_x_42
is a function that takes a piece of code that depends on a variable x
and substitutes x
with 42
. The type signature int code [x : int] -> int code
tells us that the function accepts code that may refer to a variable x
and produces regular code.
When we call with_x_42 <x + 1>
, the system allows the argument to reference x
even though x
isn't defined in the current scope — because the function promises to provide it.
This becomes especially useful when creating functions that act like new binding constructs.
A classic example is an anaphoric conditional — a conditional that binds the test expression
to a variable it
that can be used in both branches:
let aif : bool code -> int code [it : bool] -> int code [it : bool] -> int code = fun cond -> fun then_branch -> fun else_branch -> let$ s1 = cond in let$ s2 [it : bool] = then_branch in let$ s3 [it : bool] = else_branch in <let it = s1 in if it then s2 else s3> in aif <true> <42> <43>
This aif
function takes three arguments: a condition, a then-branch, and an else-branch.
The key insight is that both branches can refer to a variable it
that will be bound to
the result of evaluating the condition. The type signatures int code [it : bool]
tell
the type system that these arguments are allowed to use this variable.
The implementation uses splice variables to construct the final code: it evaluates the condition once,
binds it to it
, and then uses it
in the appropriate branch. This avoids
re-evaluating the condition and gives both branches access to its value.
You can also declare normal variables (not splice variables) with dependencies:
let x : int code [y : int] = <y + 1> in x[y=2]
Here, x
is a normal variable that depends on y
. Unlike splice variables created
with let$
, normal variables stay at the same stage level — they don't get "unwrapped" for
use in quotations. The variable x
holds the code value <y + 1>
, and when
we write x[y=2]
, it substitutes y
with 2
to produce <2 + 1>
.
Most of the time, let$
and let
provide convenient ways to create variables
with dependencies. But sometimes you need more control over when and how dependencies are introduced.
The wrap
construct lets you create values with dependencies without immediately binding
them to variables.
Most importantly, let$
and let
only allow declaring dependencies for variables,
while wrap
allows you to declare dependencies for values.
This enables you to declare dependencies for the output of a function,
or store a value with dependencies in a data structure.
Think of wrap
as creating a value that uses some symbolic variables.
It creates a value of type type <| [dependencies]
without binding it to a variable,
where type
is the type of the value and [dependencies]
are the declared dependencies.
wrap [x : int] <x + 1>
This wraps a int code
value that depends on the variable x
. The result has type
int code <| [x : int]
— this is called a wrap type. The <|
notation (<
followed by |
)
indicates that this is a wrapped value that, when unwrapped, will require the dependencies
to be provided.
Wrap types are useful when you want to store a value with dependencies in data structures
or pass it around without immediately providing the dependencies.
To use a wrapped value, you can unwrap it into a variable using let wrap
:
let wrap_s : int code <| [x : int] = wrap [x : int] <x + 1> in let wrap s : int code [x : int] = wrap_s in s[x=42]
Here, the first line creates a wrapped value without binding it to any variable. The second line
uses let wrap
to turn the wrapped value into a regular variable s
that
depends on x
. Finally, we provide the dependency x=42
to get the concrete
code <42 + 1>
.