Open LLM Inference Gateway
Every agent.
One gateway.
The endpoint that powers every Agentbot container, the Playground, and the coding agent β open for your keys too. OpenAI-compatible, provider failover, usage you can read. Swap the base URL and ship.
Gateway health
checking
- Provider
- not configured
- Latency
- -
Generate a key
Keys are shown once and stored hashed.
Quickstart
OpenAI-compatible β point any SDK at the gateway and keep your code.
curl https://agentbot.sh/v1/chat/completions \
-H "authorization: Bearer ogw_live_..." \
-H "content-type: application/json" \
-d '{"model": "mimo-v2.5-pro",
"messages": [
{"role": "user", "content": "hello, gateway"}]
}'
New to the gateway? Read how it works β
Your requests
0
Your tokens
0
Global requests
0
Global spend
$0.0000
One endpoint, many providers
Point any OpenAI-compatible client at /v1 and pass the model. Requests route through Vercel AI Gateway with OpenRouter failover β provider secrets stay server-side.
Smart routing β model: "auto"
Send model:"auto" and the gateway scores the request and routes to the cheapest capable model, escalating on failure. x-gateway-served-model names who answered.
Manage your own keys
Generate a key per project or environment, revoke instantly. Keys are shown once and stored as SHA-256 hashes β never raw.
Real-time usage, yours and global
Per-user and per-model token tracking written on every request. Read your spend in the console, not on next monthβs invoice.
Stop picking models. Send model: "auto".
NewThe gateway inspects each request and routes to the cheapest model expected to produce a good answer, escalating up the ladder on rate limits, 5xx, or empty responses before you ever see an error. You're billed at the serving model's rate, and x-gateway-served-model names who answered.
curl https://agentbot.sh/v1/chat/completions \
-H "authorization: Bearer ogw_live_..." \
-H "content-type: application/json" \
-d '{"model":"auto","route":{"priority":"cost","max_cost_usd":0.01},"messages":[{"role":"user","content":"hello"}]}'Optional hint: { "priority": "cost" | "balanced" | "quality", "max_cost_usd": 0.01 } β stripped before it reaches any provider.
Your keys
No keys yet.
Global model usage
No gateway traffic yet.
Agent primitives
More than chat. A full agent-execution layer.
One key, one base URL β five OpenAI-style endpoints that the Agentbot stack runs on, open for yours too.
Inference
POST /v1/chat/completionsmodel:auto smart routing, provider failover, free MiMo.
Fast Apply
POST /v1/applyMerge lazy code edits into full files with a fast model β no full rewrites.
Compaction
POST /v1/compactCompress long conversations so 24/7 agents stay cheap and coherent.
Code Search
POST /v1/searchRank the few relevant chunks from a corpus β agents skip the grep tax.
Planner
POST /v1/planDecompose a goal into specialized subtasks, each routed by model:auto.
A2A
POST /api/agents/:id/a2aDiscover, hire, and pay agents in USDC β JSON-RPC message/send + tasks/get.
Full reference: The Agentbot agent stack β
Already in production
You're not the first request through this pipe.
Playground β
Every app built in the Playground streams its generation through this gateway β multi-file React apps, live.
Agent containers β
Each OpenClaw runtime routes its inference here, with per-user token quotas tracked on every call.
Coding agent β
The hosted coding agent runs on the same endpoint and the same failover ladder you just read about.
Next step
Generate your first key.
Sign in, create a key in seconds, and point your client at the gateway. Keys are shown once, stored hashed, revocable in one click.