Queue System¶
Every decision an agent makes — every inference call, storage write, and delegation — flows through a message queue as a typed message. This is what makes time travel possible: because the queue is the single communication layer, the platform can record and replay any interaction without agent cooperation.
Why Queues?¶
Queues decouple producers from consumers. An agent doesn't know whether inference runs locally via Ollama or remotely via OpenRouter — it sends a typed message and gets a typed response. This abstraction lets VlinderCLI scale from a single process to a distributed cluster without modifying agent code.
More importantly, queues make the platform's rewind and fork capabilities possible. Because every interaction is a discrete message, the platform can intercept, record, and replay any of them.
Message Types¶
The protocol defines six typed messages:
| Message | Direction | Purpose |
|---|---|---|
InvokeMessage | Harness → Runtime | Submit user input to an agent |
RequestMessage | Runtime → Service | Agent calls a service (inference, storage, etc.) |
ResponseMessage | Service → Runtime | Service returns result to agent |
CompleteMessage | Runtime → Harness | Agent finishes (includes result and state hash) |
DelegateMessage | Agent → Agent | Fleet delegation — route sub-task to another agent |
ForkMessage | CLI → Platform | Create a timeline branch in the DAG |
Each message carries enough dimensions to be routed without ambiguity — see Routing Keys below.
Message Flow¶
A basic invocation — user input, one inference call, completion:
sequenceDiagram
participant H as Harness
participant Q as Queue
participant R as Runtime
participant S as Service Worker
H->>Q: InvokeMessage
Q->>R: InvokeMessage
R->>Q: RequestMessage (infer)
Q->>S: RequestMessage
S->>Q: ResponseMessage
Q->>R: ResponseMessage
R->>Q: CompleteMessage
Q->>H: CompleteMessage A fleet delegation adds agent-to-agent messaging:
sequenceDiagram
participant E as Entry Agent
participant Q as Queue
participant T as Target Agent
E->>Q: DelegateMessage
Q->>T: DelegateMessage
T->>Q: DelegateReply (CompleteMessage)
Q->>E: DelegateReply Routing Keys¶
Messages are routed using RoutingKey — a structural enum that encodes the message direction and all routing dimensions. Collision-freedom is structural: two routing keys are equal if and only if every dimension matches.
Each variant carries the dimensions needed for unambiguous delivery:
| Variant | Dimensions | Description |
|---|---|---|
Invoke | timeline, submission, harness, runtime, agent | User input → agent |
Complete | timeline, submission, agent, harness | Agent result → harness |
Request | timeline, submission, agent, service, operation, sequence | Agent → service worker |
Response | timeline, submission, service, agent, operation, sequence | Service worker → agent |
Delegate | timeline, submission, caller, target | Agent → agent |
DelegateReply | timeline, submission, caller, target, nonce | Agent → agent (reply) |
Fork | timeline, submission, agent_name | CLI → platform |
Every request variant deterministically maps to its reply variant. Invoke → Complete, Request → Response, Delegate → DelegateReply. Reply variants have no further replies — hops are one level deep.
The DelegateReply variant includes a nonce to distinguish multiple delegations to the same target within one submission.
NATS Subject Mapping¶
Routing keys translate to NATS subjects with a consistent structure:
| Type | NATS Subject |
|---|---|
| Invoke | vlinder.{timeline}.{submission}.invoke.{harness}.{runtime}.{agent} |
| Complete | vlinder.{timeline}.{submission}.complete.{agent}.{harness} |
| Request | vlinder.{timeline}.{submission}.req.{agent}.{svc}.{backend}.{op}.{seq} |
| Response | vlinder.{timeline}.{submission}.res.{svc}.{backend}.{agent}.{op}.{seq} |
| Delegate | vlinder.{timeline}.{submission}.delegate.{caller}.{target} |
| DelegateReply | vlinder.{timeline}.{submission}.delegate-reply.{caller}.{target}.{nonce} |
| Fork | vlinder.{timeline}.{submission}.fork.{agent_name} |
Workers subscribe to wildcard patterns that match their role. For example, an inference-ollama worker subscribes to vlinder.*.*.req.*.infer.ollama.> to receive all Ollama inference requests regardless of timeline, submission, or agent.
Request-Reply Facades¶
The MessageQueue trait provides convenience methods that combine send + receive into blocking calls:
| Facade | Sends | Blocks Until |
|---|---|---|
run_agent | InvokeMessage | CompleteMessage |
call_service | RequestMessage | ResponseMessage |
These are used by the harness and sidecar to simplify the common request-reply pattern.
Queue Backend¶
VlinderCLI uses NATS with JetStream for message durability.
[queue]
backend = "nats"
nats_url = "nats://localhost:4222"
nats_creds = "~/.nats.creds" # optional, for authenticated connections
Scaling¶
Because all communication flows through the queue, scaling is straightforward: add more workers for the bottleneck service. Doubling Ollama inference workers doubles inference throughput with no code changes. The routing key structure ensures messages reach the correct worker type without conflicts.
See Also¶
- Architecture — component overview and worker types
- Domain Model —
RoutingKey,ServiceBackend, andMessageQueuetrait - Distributed Deployment — multi-node setup