World Model Stack

Building the systems layer between models and reliable agency.

The next frontier is not just better conversation. It is infrastructure that lets models maintain state, retrieve memory, route intelligently, use tools, and remain inspectable while acting over time.

•world-model applications

•agent runtime

•state + memory

•retrieval + context

•simulation / prediction

•tool + environment interface

•model routing + local/cloud inference

•observability + evaluation

Stack Map

The world model infrastructure stack

This page defines the category through concrete layers and ties each layer back to work already published on this site.

Layer 1

World-model applications

Applications that need durable context, tool use, and long-horizon behavior instead of a single prompt-response loop.

The public expression here is still emerging, but the supporting layers below are already visible in the codebase.

Layer 2

Agent runtime

Role-aware execution, topology, health, and operational control for agent work.

subagent-fleet provides the cleanest example: one fleet topology generating routes, agent definitions, warmup flows, and dashboard state.

Layer 3

State + memory layer

Durable context about goals, preferences, observations, and prior work across sessions.

awesome-agentic-memory maps the broader category, while embenx pushes toward practical temporal and agentic memory primitives.

Layer 4

Retrieval + context layer

Search, filtering, reranking, and context assembly without backend lock-in.

embenx unifies retrieval across 15+ backends and adds hybrid search, metadata filtering, and reranking hooks.

Layer 5

Simulation / prediction layer

The ability to test futures, compare actions, or retrieve state-action trajectories before committing.

The strongest signal today is directional: the embenx roadmap includes trajectory retrieval for world models, but this layer is still being built out.

Layer 6

Tool + environment interface

The surface where models connect to MCP tools, code interfaces, and external systems.

embenx ships an MCP server, awesome-agentic-memory tracks MCP-native memory servers, and subagent-fleet generates assistant-facing agent interfaces.

Layer 7

Model routing + local/cloud inference

Choosing the right model and machine for the job rather than treating inference as one generic endpoint.

subagent-fleet sits directly here with LiteLLM routing across local Ollama nodes and role-specific models.

Layer 8

Observability + evaluation

Behavior should be inspectable, benchmarkable, and visible over time.

subagent-fleet includes live traces and published evals, while AI Toolkit exposes smaller-scale scoring and prompt-structure heuristics.

Recurring Loop

Observe → Model → Simulate → Act → Evaluate → Update

This loop is the conceptual bridge between the current agent stack and more durable world-model systems. The site should keep returning to it because it makes the research direction legible.

Observe + model

Retrieval and memory are the substrate for observation. embenx and awesome-agentic-memory both sit here.

Simulate + act

Runtime design and routing determine how the system turns plans into execution. subagent-fleet is the local control-plane proof point.

Evaluate + update

Behavior has to be visible enough to improve. Published fleet evals and prompt scoring tooling are the current concrete signals in this repo.

Research Agenda

The questions behind the stack

The stack is useful only if it points at real questions. These are the six recurring research tracks suggested by the current body of work.

•How should agents maintain durable state about users, tasks, tools, and environments?

•How should memory be retrieved, compressed, forgotten, and updated over long horizons?

•How can agents test possible actions before acting instead of relying on single-shot generation?

•How should systems route between local models, cloud models, tools, and specialized interfaces?

•How do we evaluate systems that act over time rather than answer one prompt?

•How can serious agent infrastructure remain local-first and inspectable for builders?

Grounding

Where the current proof lives

This stack page is only credible if it points back to real artifacts. These links are the current proof surfaces inside the site.

Primary artifacts

runtimememoryretrievalroutingmcpevals

subagent-fleet write-up embenx guide awesome-agentic-memory AI Toolkit

Related pages in this slice

Systems reframes projects as research artifacts, while Now captures the active fronts that currently matter most.