Aditya Karnam · World Model Infrastructure Lab

Aditya Karnam - World Model Infrastructure Builder

Building the systems layer for agents that remember, simulate, and act.

I work on the infrastructure between foundation models and real-world agency: state, memory, retrieval, routing, evaluation, local inference, and the runtimes that make agent behavior more reliable over time.World model infrastructure is the systems layer that lets AI maintain state, retrieve memory, simulate outcomes, route between models, use tools, and interact with environments without collapsing back into a one-shot prompt.

Research Notes

initializing world model stack...
loading memory layer...
attaching retrieval interfaces...
routing local + cloud models...
starting evaluation loop...
status: ready

Primary wedgeMemory + routing + evals

ModeResearch-driven engineering

BiasLocal-first systems

ThroughlineExplicit runtime behavior

Research Position

The infrastructure layer I care about

I am less interested in AI as a chat interface and more interested in the systems that make agents durable, inspectable, and composable.

State + memory

Agents need a durable working model of users, goals, tasks, tools, failures, and environments. That means memory should be explicit, updatable, and debuggable.

Retrieval + routing

The right context and the right model are both routing problems. Retrieval, backend abstraction, local inference, and multi-model orchestration are part of the same systems question.

Evaluation + observability

If an agent operates over time, it should be scored over time. I care about traces, failure modes, repeatability, and evals that reflect system behavior rather than one isolated answer.

Operating Loop

Observe -> Model -> Simulate -> Act -> Evaluate -> Update

This is the recurring frame behind the site. It is how I think agent systems move from prompt chains toward world-model behavior.

Observe

Capture signals from users, tools, files, environments, and execution traces before acting.

Model

Maintain an explicit state of goals, constraints, resources, and prior decisions instead of relying on one prompt window.

Simulate

Evaluate routes, tool choices, and likely outcomes before spending tokens, time, or trust.

Act

Use runtimes, tools, and model interfaces that make agent behavior legible rather than mysterious.

Evaluate

Score outputs over time: correctness, traceability, cost, recovery behavior, and system drift.

Update

Write learnings back into memory and routing policy so the system gets better with use.

Public Proof

What already exists in public

These artifacts are the clearest public evidence of the direction: memory systems, local routing, inference reliability, workflow tooling, and earlier research.

Local inference + routing

subagent-fleet

A local AI compute control plane for Claude Code-style subagents, Ollama nodes, LiteLLM routing, model warmup, and runtime visibility.Open artifact

Retrieval + memory

embenx

A unified retrieval layer across vector backends, with temporal memory, filtering, reranking, and an MCP interface for agent use.Open artifact

Inference reliability

MLX non-determinism

A reproducibility investigation into Apple Silicon LLM inference and the batch-invariance failures that make local evaluation harder than it looks.Open artifact

Workflow instrumentation

AI Toolkit

Prompt and workflow utilities that translate abstract LLM advice into concrete tooling, grading, and repeatable interfaces.Open artifact

Memory landscape

awesome-agentic-memory

A curated map of agent memory patterns, systems, and open questions that informs how I think about long-horizon agent state.Open artifact

Research grounding

ERBGA paper

Earlier published work on reduced-bias genetic algorithms for community detection, which still shapes how I think about search, structure, and system behavior.Open source

Principles

How I evaluate this space

The site thesis is not that bigger models solve everything. It is that better systems design will decide which AI products actually hold up.

Operating principle

Useful AI systems need inspectable memory, not hidden context glued together by luck.

Operating principle

Model choice is a systems problem. Routing, locality, latency, and failure modes matter as much as raw benchmark scores.

Operating principle

Agent infrastructure should expose state transitions, tool calls, and evaluation traces so behavior can be audited over time.

Operating principle

Local-first capability matters because serious experimentation gets easier when builders can control cost, privacy, and iteration speed.

Current focus

Right now the strongest threads are local-first agent infrastructure, memory and retrieval abstractions, runtime visibility for coding agents, and evaluation layers that reflect behavior over time instead of just prompt quality in isolation.

Read field notes Explore subagent-fleet