Skip to content

Queue System

Every decision an agent makes — every inference call, storage write, and delegation — flows through a message queue as a typed message. This is what makes time travel possible: because the queue is the single communication layer, the platform can record and replay any interaction without agent cooperation.

Why Queues?

Queues decouple producers from consumers. An agent doesn't know whether inference runs locally via Ollama or remotely via OpenRouter — it sends a typed message and gets a typed response. This abstraction lets VlinderCLI scale from a single process to a distributed cluster without modifying agent code.

More importantly, queues make the platform's rewind and fork capabilities possible. Because every interaction is a discrete message, the platform can intercept, record, and replay any of them.

Message Types

The protocol defines six typed messages:

Message Direction Purpose
InvokeMessage Harness → Runtime Submit user input to an agent
RequestMessage Runtime → Service Agent calls a service (inference, storage, etc.)
ResponseMessage Service → Runtime Service returns result to agent
CompleteMessage Runtime → Harness Agent finishes (includes result and state hash)
DelegateMessage Agent → Agent Fleet delegation — route sub-task to another agent
ForkMessage CLI → Platform Create a timeline branch in the DAG

Each message carries enough dimensions to be routed without ambiguity — see Routing Keys below.

Message Flow

A basic invocation — user input, one inference call, completion:

sequenceDiagram
    participant H as Harness
    participant Q as Queue
    participant R as Runtime
    participant S as Service Worker

    H->>Q: InvokeMessage
    Q->>R: InvokeMessage
    R->>Q: RequestMessage (infer)
    Q->>S: RequestMessage
    S->>Q: ResponseMessage
    Q->>R: ResponseMessage
    R->>Q: CompleteMessage
    Q->>H: CompleteMessage

A fleet delegation adds agent-to-agent messaging:

sequenceDiagram
    participant E as Entry Agent
    participant Q as Queue
    participant T as Target Agent

    E->>Q: DelegateMessage
    Q->>T: DelegateMessage
    T->>Q: DelegateReply (CompleteMessage)
    Q->>E: DelegateReply

Routing Keys

Messages are routed using RoutingKey — a structural enum that encodes the message direction and all routing dimensions. Collision-freedom is structural: two routing keys are equal if and only if every dimension matches.

Each variant carries the dimensions needed for unambiguous delivery:

Variant Dimensions Description
Invoke timeline, submission, harness, runtime, agent User input → agent
Complete timeline, submission, agent, harness Agent result → harness
Request timeline, submission, agent, service, operation, sequence Agent → service worker
Response timeline, submission, service, agent, operation, sequence Service worker → agent
Delegate timeline, submission, caller, target Agent → agent
DelegateReply timeline, submission, caller, target, nonce Agent → agent (reply)
Fork timeline, submission, agent_name CLI → platform

Every request variant deterministically maps to its reply variant. InvokeComplete, RequestResponse, DelegateDelegateReply. Reply variants have no further replies — hops are one level deep.

The DelegateReply variant includes a nonce to distinguish multiple delegations to the same target within one submission.

NATS Subject Mapping

Routing keys translate to NATS subjects with a consistent structure:

vlinder.{timeline}.{submission}.{type}.{...dimensions}
Type NATS Subject
Invoke vlinder.{timeline}.{submission}.invoke.{harness}.{runtime}.{agent}
Complete vlinder.{timeline}.{submission}.complete.{agent}.{harness}
Request vlinder.{timeline}.{submission}.req.{agent}.{svc}.{backend}.{op}.{seq}
Response vlinder.{timeline}.{submission}.res.{svc}.{backend}.{agent}.{op}.{seq}
Delegate vlinder.{timeline}.{submission}.delegate.{caller}.{target}
DelegateReply vlinder.{timeline}.{submission}.delegate-reply.{caller}.{target}.{nonce}
Fork vlinder.{timeline}.{submission}.fork.{agent_name}

Workers subscribe to wildcard patterns that match their role. For example, an inference-ollama worker subscribes to vlinder.*.*.req.*.infer.ollama.> to receive all Ollama inference requests regardless of timeline, submission, or agent.

Request-Reply Facades

The MessageQueue trait provides convenience methods that combine send + receive into blocking calls:

Facade Sends Blocks Until
run_agent InvokeMessage CompleteMessage
call_service RequestMessage ResponseMessage

These are used by the harness and sidecar to simplify the common request-reply pattern.

Queue Backend

VlinderCLI uses NATS with JetStream for message durability.

[queue]
backend = "nats"
nats_url = "nats://localhost:4222"
nats_creds = "~/.nats.creds"  # optional, for authenticated connections

Scaling

Because all communication flows through the queue, scaling is straightforward: add more workers for the bottleneck service. Doubling Ollama inference workers doubles inference throughput with no code changes. The routing key structure ensures messages reach the correct worker type without conflicts.

See Also