Multi-Stage Programming with Splice Variables

What is multi-stage programming? It's a technique where programs generate other programs. Instead of writing generic code that handles all cases at runtime, you generate specialized, optimized code tailored to specific situations.

For example, instead of a power function that uses a loop, you could generate specialized code like x * x * x * x * x directly. This eliminates runtime overhead and creates highly optimized code.

Our approach introduces splice variables — a new way to make code generation predictable and safe. This technique provides precise control over the generation process and seamlessly scales to advanced features like code pattern matching and rewriting. The type system automatically tracks variable dependencies, ensuring that generated code is always well-formed, properly scoped, and type-checks correctly.

Playground

The code editor below lets you experiment with the language described in the paper. Try the examples as you read through the tutorial, or experiment with your own code! For more details on the type system's inner workings and implementation, please refer to the paper.

Tutorial

The Basics

The language syntax resembles OCaml or (to a lesser extent) Haskell. It includes basic types like int and bool, arithmetic operators (+, -, *, /), comparison (==), conditionals (if then else), functions, and let bindings. Type annotations are written as e : type for expression e.

Quotations: Capturing Code as Data

The first key idea is quotations. When you write <1 + 2>, instead of computing 3, this creates a piece of code that represents the expression 1 + 2.

The angle brackets < > tell the system "don't evaluate this yet, just treat it as data." Think of it like putting code in quotes to talk about the code itself rather than running it.

Quotations can even be nested inside other quotations to create code that generates code:

Splice Variables: Injecting Code into Code

Now comes the interesting part: how can quoted code be used inside other quoted code? Consider having some code stored in a quotation, and wanting to use it as part of a larger piece of code.

The let$ construct "unwraps" a piece of quoted code and allows its use inside other quotations:

let$ x = <1> in <x + x>

What happened here? The quoted code <1> was unwrapped into a variable called x, then x was used twice inside a new quotation. The result is code that adds 1 + 1.

The variable x is called a splice variable because it "splices" the unwrapped code into the new quotation.

More interestingly, code that contains variables can also be unwrapped by declaring variable dependencies:

let$ s [x : int] = <x + 1> in <fun x : int -> s>

Here, s is a splice variable that depends on an integer variable x. The square brackets [x : int] say "this code uses a variable called x of type int." When we use s, it automatically captures the variable x from the surrounding context.

Instead of capturing the same variable, you can also explicitly provide a value for the variable:

let$ s [x : int] = <x + 1> in <s [x=42]>

Here, we tell the system to substitute x with 42 when generating the code. The result is code that adds 42 + 1.

Why Splice Variables Are Necessary

Why do we need this special let$ syntax instead of just using regular variables? There are two fundamental reasons why splice variables are necessary for safe code generation.

Stage Tracking

The first reason is stage tracking: In our system, each variable has a stage level that determines how deeply nested in quotations it can be used. Regular variables exist at stage 0 (the current evaluation context), but let$ creates splice variables at stage 1, which means they can be used inside one level of quotations to represent code fragments.

For example, with nested quotations like <<x>> (code that generates code that uses x), you need variables at different stages: stage 0 for the outermost context, stage 1 for inside the first quotation, and stage 2 for inside the nested quotation. This stage tracking ensures that variables are only used in appropriate contexts.

let$ s [x@1 : bool; y@2 : int] = <if x then <y + 1> else <y + 2>> in
<fun x : int -> s[y=42]>

In this example, a splice variable s is created that depends on a boolean variable x at stage 1 and an integer variable y at stage 2. The code inside s uses these variables to generate code that conditionally generates either y + 1 or y + 2. When s[y=42] is used inside a function, it substitutes y with 42, generating code that evaluates to either 42 + 1 or 42 + 2 based on the value of x.

This demonstrates how stage tracking works in practice: x is used inside the first quotation, y is used inside the nested quotation, and s brings code into quotations. The type system ensures each variable is only used at the quotation level it belongs to.

Dependency Tracking

The second reason is dependency tracking: When you create a piece of code like <x + 1>, it contains a reference to variable x. This code only makes sense if there's actually a variable x available when the code runs. The challenge is keeping track of these dependencies as code gets moved around, put inside functions, or extracted from functions.

let$ s [x : int] = <x + 1> in <fun x : int -> s>

Here, s depends on x, and we use it inside a function that has a parameter named x. The system automatically captures the function's x parameter to satisfy the dependency, so we get a function that adds 1 to its input.

let$ s [x : int] = <x + 1> in <fun y : int -> s>

But here, the function parameter is named y, not x. Since s needs a variable x that doesn't exist in this context, the system reports an error. This prevents you from accidentally generating code with undefined variables.

let$ s [x : int] = <x + 1> in <fun y : int -> s[x=y]>

let$ s [x : int] = <x + 1> in <fun y : int -> s[x=y * y]>

This dependency tracking is also relevant when pattern matching on code. (We'll see how code pattern matching works later in the tutorial.) If you extract the body from <fun x -> x + 1>, that extracted code now depends on the variable x from the function. Our type system automatically tracks these dependencies, ensuring all generated code has correct scoping.

Our type system automatically tracks both stage levels and variable dependencies, ensuring that all generated code is well-formed, properly scoped, and type-checks correctly. This makes code generation predictable and safe.

A Practical Example: The Power Function

Let's see how these ideas work together in a practical example. We'll build a function that creates optimized power functions — instead of using loops, we'll generate code that does direct multiplication.

let rec power : int code -> int -> int code = fun x -> fun n ->
  if n == 0 then <1>
  else
    let$ s1 = x in
    let$ s2 = power x (n - 1) in
    <s1 * s2>
in power <2> 3

This function takes some quoted code x and a number n, then builds code that multiplies x by itself n times. Try it with power <2> 3 to see how it generates 2 * 2 * 2 * 1.

Making Code Reusable

What if we want to create a power function that can work with different inputs? This is where dependency variables come in. They let us create code templates with placeholders.

let rec power : int code -> int -> int code = fun x -> fun n ->
  if n == 0 then <1>
  else
    let$ s1 = x in
    let$ s2 = power x (n - 1) in
    <s1 * s2> in
let$ power5 [x : int] = power <x> 5 in
  <fun x : int -> power5>

Here, power5 is a splice variable that depends on x. The square brackets [x : int] say "this code fragment uses a variable called x that will be provided later." When we use power5 inside the function, it automatically gets the x from the function parameter.

The result is a function that directly multiplies its input five times — no loops, no recursion at runtime!

Code Pattern Matching

Our language also supports pattern matching on code, which allows you to inspect and transform code structures. This is useful for optimizations and code analysis.

let swap_add : int code -> int code = fun e ->
  match$ e with
    | (x + y) -> <y + x>
    | _ -> e
in
  swap_add <1 + 2>

This function takes a piece of code and, if it's an addition, swaps the operands. The pattern (x + y) matches any addition where x and y become pattern variables that can be used in the result.

Working with Functions

Pattern matching becomes more tricky when dealing with code that contains variables. This is where the variable dependency tracking we discussed earlier becomes crucial. Our system automatically handles the binding structure:

match$ <fun x : int -> x + 1> with | (fun x -> body) -> <body [x=42]>

Here, body captures the function body x + 1 and remembers that it uses a variable x. We can then substitute x with 42 to get 42 + 1. The system knows that body depends on x and handles the substitution correctly.

Code generation often produces unoptimized expressions with redundant operations. For example, when generating mathematical code, you might end up with expressions like x * 0 or 1 * y that could be simplified. Code rewriting lets you automatically clean up these patterns.

<42 + (0 * 5)>
  rewrite (0 * z) -> <0>
  rewrite (z + 0) -> <z>

This applies two simplification rules to the expression 42 + (0 * 5): first it replaces 0 * 5 with 0, then it replaces 42 + 0 with 42. Each rewrite rule says "whenever you see this pattern anywhere in the code, replace it with that expression."

The key difference from match$ is that rewrite searches through the entire code structure and applies transformations wherever patterns match, not just at the top level. This makes it perfect for cleanup operations that need to work throughout a complex expression.

Again, we use splice variables to handle code patterns that contain variable bindings. The system automatically tracks variable dependencies, so you can write transformations that manipulate binding structures safely:

<(fun y : int -> y * 2) ((fun x : int -> x + 1) 5)>
  rewrite ((fun arg : int -> body : int) expr) -> <body [arg=expr]>

This rewrite rule performs beta reduction — it finds function applications and substitutes the argument into the function body. The pattern (fun arg -> body) expr matches any function being applied to an argument. The system knows that body depends on the variable arg, so when we write body [arg=expr], it correctly substitutes all occurrences of arg in the body with the provided expression.

This kind of transformation would be error-prone to write manually, but our dependency tracking system handles all the variable scoping automatically, ensuring the result is always well-formed.

Advanced Features

Unhygienic Functions: Intentional Variable Capture

Most programming languages try to prevent accidental variable capture — when a macro or function accidentally uses a variable from the surrounding context, or defines a variable that conflicts with the context where the macro is expanded. This can lead to bugs that are hard to track down. This property is called hygiene, and it's generally a good thing. But sometimes you actually want to define a variable that can be used within the arguments given to a function.

Our language supports unhygienic functions — functions whose arguments can use variables that are not defined in the current scope but are provided by the function itself. This allows you to write code generation patterns that intentionally introduce or capture variables.

The simplest example is a function that expects its argument to use a specific variable:

let with_x_42 : int code [x : int] -> int code = fun c -> c[x=42] in
  with_x_42 <x + 1>

Here, with_x_42 is a function that takes a piece of code that depends on a variable x and substitutes x with 42. The type signature int code [x : int] -> int code tells us that the function accepts code that may refer to a variable x and produces regular code. When we call with_x_42 <x + 1>, the system allows the argument to reference x even though x isn't defined in the current scope — because the function promises to provide it.

This becomes especially useful when creating functions that act like new binding constructs. A classic example is an anaphoric conditional — a conditional that binds the test expression to a variable it that can be used in both branches:

let aif : bool code -> int code [it : bool] -> int code [it : bool] -> int code = 
  fun cond -> fun then_branch -> fun else_branch ->
    let$ s1 = cond in
    let$ s2 [it : bool] = then_branch in
    let$ s3 [it : bool] = else_branch in
    <let it = s1 in if it then s2 else s3>
in
  aif <true> <42> <43>

This aif function takes three arguments: a condition, a then-branch, and an else-branch. The key insight is that both branches can refer to a variable it that will be bound to the result of evaluating the condition. The type signatures int code [it : bool] tell the type system that these arguments are allowed to use this variable.

The implementation uses splice variables to construct the final code: it evaluates the condition once, binds it to it, and then uses it in the appropriate branch. This avoids re-evaluating the condition and gives both branches access to its value.

let x : int code [y : int] = <y + 1> in
  x[y=2]

Here, x is a normal variable that depends on y. Unlike splice variables created with let$, normal variables stay at the same stage level — they don't get "unwrapped" for use in quotations. The variable x holds the code value <y + 1>, and when we write x[y=2], it substitutes y with 2 to produce <2 + 1>.

Wrap Types

Most of the time, let$ and let provide convenient ways to create variables with dependencies. But sometimes you need more control over when and how dependencies are introduced. The wrap construct lets you create values with dependencies without immediately binding them to variables.

Most importantly, let$ and let only allow declaring dependencies for variables, while wrap allows you to declare dependencies for values. This enables you to declare dependencies for the output of a function, or store a value with dependencies in a data structure.

Think of wrap as creating a value that uses some symbolic variables. It creates a value of type type <| [dependencies] without binding it to a variable, where type is the type of the value and [dependencies] are the declared dependencies.

wrap [x : int] <x + 1>

This wraps a int code value that depends on the variable x. The result has type int code <| [x : int] — this is called a wrap type. The <| notation (< followed by |) indicates that this is a wrapped value that, when unwrapped, will require the dependencies to be provided. Wrap types are useful when you want to store a value with dependencies in data structures or pass it around without immediately providing the dependencies.

let wrap_s : int code <| [x : int] = wrap [x : int] <x + 1> in
let wrap s : int code [x : int] = wrap_s in
  s[x=42]

Here, the first line creates a wrapped value without binding it to any variable. The second line uses let wrap to turn the wrapped value into a regular variable s that depends on x. Finally, we provide the dependency x=42 to get the concrete code <42 + 1>.