Skip to content

Architecture

IRx is organized as a small compiler pipeline with a deliberate boundary between semantic meaning and backend-specific lowering. The goal is to keep the codebase easy to extend without letting semantic rules slowly drift into code generation.

Design Goals

The current architecture is shaped by a few practical goals:

  • Keep parsing, semantic analysis, and code generation as distinct phases.
  • Make semantic analysis the authority for meaning and program validity.
  • Keep backend packages focused on emission, not interpretation.
  • Preserve method-based multiple dispatch for visitor-driven lowering.
  • Use package structure to communicate architecture instead of large utility modules or generic helpers/ folders.

Pipeline Overview

IRx currently follows this high-level flow:

ASTx parser output -> semantic analysis -> resolved semantic sidecars -> backend code generation

The parser produces raw ASTx nodes. Those nodes are still close to surface syntax and may not yet have enough information for direct lowering. The semantic-analysis phase walks that tree, resolves symbols and types, validates program rules, and attaches a structured node.semantic sidecar to the nodes that backend code needs.

By the time a backend starts lowering, it should not need to infer meaning from raw syntax or re-run language validation from scratch.

Semantic Analysis

The semantic-analysis package lives in src/irx/analysis/ and is intentionally independent from LLVM or llvmlite.

It is responsible for:

  • symbol resolution
  • lexical scope tracking
  • mutability and assignment validation
  • function and return validation
  • loop-control legality such as break and continue
  • expression typing and promotion policy
  • operator normalization
  • semantic flag normalization such as unsigned and fast-math intent
  • diagnostics collection and semantic error reporting

The public entry points are:

  • irx.analysis.analyze(node)
  • irx.analysis.analyze_module(module)

These entry points return the same AST root after attaching semantic sidecars. If semantic validation fails, analysis raises SemanticError before codegen begins.

Why sidecars instead of a separate HIR?

For the current size of IRx, attaching explicit semantic sidecars to AST nodes is the lightest approach that still creates a clean boundary. It gives codegen resolved information without introducing a second full tree structure before it is needed.

If the language grows to the point where a true HIR becomes useful, the current phase split still leaves room for that evolution.

Shared Visitor Foundation

IRx also has a shared visitor layer in src/irx/base/visitors/.

It currently provides:

  • BaseVisitorProtocol: the minimal typing contract shared by visitor-style classes
  • BaseVisitor: a concrete Plum-dispatch scaffold with explicit NotImplementedError defaults for the current ASTx node surface

This keeps typing and runtime behavior separate:

  • protocols define what visitor-like objects must expose
  • the concrete base class defines what happens for unsupported nodes

In practice:

  • SemanticAnalyzer inherits BaseVisitor
  • BuilderVisitor inherits BaseVisitor
  • backend-specific protocols such as llvmliteir.VisitorProtocol extend BaseVisitorProtocol

Backend Architecture

Each backend should live in its own package under src/irx/builders/. The package path identifies the backend, while the classes inside the package use short generic names.

For example, src/irx/builders/llvmliteir/ exposes:

  • Builder
  • Visitor
  • VisitorProtocol
  • optional VisitorCore as a module-private implementation class

This naming convention matters for future backends. A contributor adding a new backend should not need to invent unique class prefixes when the package path already provides the context.

llvmliteir Package Layout

The LLVM backend is split into first-class modules instead of one monolithic builder:

  • ../src/irx/base/visitors/: shared visitor protocol and runtime scaffold
  • facade.py: public backend entry points
  • core.py: shared mutable lowering state and backend lifecycle
  • protocols.py: typing contract used by mixins and runtime features
  • types.py, casting.py, vector.py, strings.py, runtime.py: shared IR infrastructure
  • visitors/: concern-grouped visit(...) overloads

Foundational modules stay at the package root because they are architectural components, not incidental helpers.

Why visit(...) Remains the Public Lowering Boundary

The codegen layer continues to use method-based Plum multiple dispatch:

  • visit(self, node: ...)

This remains the only public dispatch boundary for backend lowering. IRx does not use a free-function dispatch registry or a second public API like lower(...) or build_node(...).

That choice keeps backend code readable and local:

  • AST-family-specific lowering remains attached to the visitor class.
  • Mixins can group overloads by concern without changing the public surface.
  • Shared lowering state stays on the visitor instance instead of moving into a registry-driven design.

Core Class and Protocol

VisitorProtocol and VisitorCore serve different purposes:

  • VisitorProtocol defines the stable interface that mixins and runtime feature declarations depend on for typing, building on BaseVisitorProtocol.
  • VisitorCore is the concrete implementation center that owns mutable state, module setup, helper methods, and backend lifecycle.

VisitorCore is still internal to the backend package. IRx uses from public import private for module-level internal helpers and internal implementation classes when a clear non-underscored name reads better than an underscore-prefixed export. That keeps internal names readable without making them part of the intended public surface.

The protocol is not a replacement for the core class. It exists so backend subsystems can depend on a narrow contract instead of the full concrete type.

Visitor Mixins

The final backend visitor is composed from concern-specific mixins plus the shared core. Each mixin should contain:

  • @dispatch def visit(self, node: ...) overloads for one concern
  • a small number of private helpers local to that concern

Examples of concern boundaries include:

  • literals
  • variables
  • unary and binary operators
  • control flow
  • functions
  • runtime or domain-specific lowering

This keeps dispatch organization aligned with language structure while still sharing one lowering state object.

Contributor Guidelines

When extending IRx, these rules help preserve the architecture:

  • Put semantic meaning and validation in analysis/, not in a backend.
  • Let codegen consume normalized semantic information instead of re-deriving it.
  • Keep shared visitor dispatch defaults in src/irx/base/visitors/ so semantic and backend visitors fail consistently for unsupported ASTx nodes.
  • Add new backend-wide infrastructure at the package root, not under helpers/.
  • Keep mutable lowering state instance-local.
  • Prefer explicit code over clever abstractions.
  • Use the package name, not class prefixes, to identify the backend.

When To Add A New Backend

If IRx gains another backend, it should follow the same broad shape:

  • a package under src/irx/builders/
  • a public Builder
  • a public Visitor
  • a VisitorProtocol if mixins or runtime hooks need typed access
  • an optional module-private VisitorCore for shared state and infrastructure

That keeps backend packages consistent for both contributors and users of the library.