Aditya Karnam · World Model Infrastructure Lab
Aditya Karnam - World Model Infrastructure Builder
Building the systems layer for agents that remember, simulate, and act.
I work on the infrastructure between foundation models and real-world agency: state, memory, retrieval, routing, evaluation, local inference, and the runtimes that make agent behavior more reliable over time.World model infrastructure is the systems layer that lets AI maintain state, retrieve memory, simulate outcomes, route between models, use tools, and interact with environments without collapsing back into a one-shot prompt.Research Notes
initializing world model stack...
loading memory layer...
attaching retrieval interfaces...
routing local + cloud models...
starting evaluation loop...
status: readyPrimary wedgeMemory + routing + evals
ModeResearch-driven engineering
BiasLocal-first systems
ThroughlineExplicit runtime behavior
Research Position
The infrastructure layer I care about
I am less interested in AI as a chat interface and more interested in the systems that make agents durable, inspectable, and composable.State + memory
Agents need a durable working model of users, goals, tasks, tools, failures, and environments. That means memory should be explicit, updatable, and debuggable.
Retrieval + routing
The right context and the right model are both routing problems. Retrieval, backend abstraction, local inference, and multi-model orchestration are part of the same systems question.
Evaluation + observability
If an agent operates over time, it should be scored over time. I care about traces, failure modes, repeatability, and evals that reflect system behavior rather than one isolated answer.
Operating Loop
Observe -> Model -> Simulate -> Act -> Evaluate -> Update
This is the recurring frame behind the site. It is how I think agent systems move from prompt chains toward world-model behavior.01
Capture signals from users, tools, files, environments, and execution traces before acting.Observe
02
Maintain an explicit state of goals, constraints, resources, and prior decisions instead of relying on one prompt window.Model
03
Evaluate routes, tool choices, and likely outcomes before spending tokens, time, or trust.Simulate
04
Use runtimes, tools, and model interfaces that make agent behavior legible rather than mysterious.Act
05
Score outputs over time: correctness, traceability, cost, recovery behavior, and system drift.Evaluate
06
Write learnings back into memory and routing policy so the system gets better with use.Update
Public Proof
What already exists in public
These artifacts are the clearest public evidence of the direction: memory systems, local routing, inference reliability, workflow tooling, and earlier research.Local inference + routing
subagent-fleet
A local AI compute control plane for Claude Code-style subagents, Ollama nodes, LiteLLM routing, model warmup, and runtime visibility.Open artifactRetrieval + memory
embenx
A unified retrieval layer across vector backends, with temporal memory, filtering, reranking, and an MCP interface for agent use.Open artifactInference reliability
MLX non-determinism
A reproducibility investigation into Apple Silicon LLM inference and the batch-invariance failures that make local evaluation harder than it looks.Open artifactWorkflow instrumentation
AI Toolkit
Prompt and workflow utilities that translate abstract LLM advice into concrete tooling, grading, and repeatable interfaces.Open artifactMemory landscape
awesome-agentic-memory
A curated map of agent memory patterns, systems, and open questions that informs how I think about long-horizon agent state.Open artifactResearch grounding
ERBGA paper
Earlier published work on reduced-bias genetic algorithms for community detection, which still shapes how I think about search, structure, and system behavior.Open sourcePrinciples
How I evaluate this space
The site thesis is not that bigger models solve everything. It is that better systems design will decide which AI products actually hold up.Operating principle
Useful AI systems need inspectable memory, not hidden context glued together by luck.
Operating principle
Model choice is a systems problem. Routing, locality, latency, and failure modes matter as much as raw benchmark scores.
Operating principle
Agent infrastructure should expose state transitions, tool calls, and evaluation traces so behavior can be audited over time.
Operating principle
Local-first capability matters because serious experimentation gets easier when builders can control cost, privacy, and iteration speed.