Building the systems layer between models and reliable agency.
The next frontier is not just better conversation. It is infrastructure that lets models maintain state, retrieve memory, route intelligently, use tools, and remain inspectable while acting over time.
The world model infrastructure stack
This page defines the category through concrete layers and ties each layer back to work already published on this site.
World-model applications
Applications that need durable context, tool use, and long-horizon behavior instead of a single prompt-response loop.
The public expression here is still emerging, but the supporting layers below are already visible in the codebase.
Agent runtime
Role-aware execution, topology, health, and operational control for agent work.
subagent-fleet provides the cleanest example: one fleet topology generating routes, agent definitions, warmup flows, and dashboard state.
State + memory layer
Durable context about goals, preferences, observations, and prior work across sessions.
awesome-agentic-memory maps the broader category, while embenx pushes toward practical temporal and agentic memory primitives.
Retrieval + context layer
Search, filtering, reranking, and context assembly without backend lock-in.
embenx unifies retrieval across 15+ backends and adds hybrid search, metadata filtering, and reranking hooks.
Simulation / prediction layer
The ability to test futures, compare actions, or retrieve state-action trajectories before committing.
The strongest signal today is directional: the embenx roadmap includes trajectory retrieval for world models, but this layer is still being built out.
Tool + environment interface
The surface where models connect to MCP tools, code interfaces, and external systems.
embenx ships an MCP server, awesome-agentic-memory tracks MCP-native memory servers, and subagent-fleet generates assistant-facing agent interfaces.
Model routing + local/cloud inference
Choosing the right model and machine for the job rather than treating inference as one generic endpoint.
subagent-fleet sits directly here with LiteLLM routing across local Ollama nodes and role-specific models.
Observability + evaluation
Behavior should be inspectable, benchmarkable, and visible over time.
subagent-fleet includes live traces and published evals, while AI Toolkit exposes smaller-scale scoring and prompt-structure heuristics.
Observe → Model → Simulate → Act → Evaluate → Update
This loop is the conceptual bridge between the current agent stack and more durable world-model systems. The site should keep returning to it because it makes the research direction legible.
Observe + model
Retrieval and memory are the substrate for observation. embenx and awesome-agentic-memory both sit here.
Simulate + act
Runtime design and routing determine how the system turns plans into execution. subagent-fleet is the local control-plane proof point.
Evaluate + update
Behavior has to be visible enough to improve. Published fleet evals and prompt scoring tooling are the current concrete signals in this repo.
The questions behind the stack
The stack is useful only if it points at real questions. These are the six recurring research tracks suggested by the current body of work.
Where the current proof lives
This stack page is only credible if it points back to real artifacts. These links are the current proof surfaces inside the site.