Skip to main content

Open LLM Inference Gateway

Every agent.
One gateway.

The endpoint that powers every Agentbot container, the Playground, and the coding agent β€” open for your keys too. OpenAI-compatible, provider failover, usage you can read. Swap the base URL and ship.

your client──▢agentbot /v1──▢vercel ai gatewayβ‡’ failover β‡’openrouter──▢model

Gateway health

checking

Provider
not configured
Latency
-

Generate a key

Keys are shown once and stored hashed.

Quickstart

OpenAI-compatible β€” point any SDK at the gateway and keep your code.

curl https://agentbot.sh/v1/chat/completions \
  -H "authorization: Bearer ogw_live_..." \
  -H "content-type: application/json" \
  -d '{
    "model": "mimo-v2.5-pro",
    "messages": [
      {"role": "user", "content": "hello, gateway"}
    ]
  }'

New to the gateway? Read how it works β†’

Your requests

0

Your tokens

0

Global requests

0

Global spend

$0.0000

One endpoint, many providers

Point any OpenAI-compatible client at /v1 and pass the model. Requests route through Vercel AI Gateway with OpenRouter failover β€” provider secrets stay server-side.

Smart routing β€” model: "auto"

Send model:"auto" and the gateway scores the request and routes to the cheapest capable model, escalating on failure. x-gateway-served-model names who answered.

Manage your own keys

Generate a key per project or environment, revoke instantly. Keys are shown once and stored as SHA-256 hashes β€” never raw.

Real-time usage, yours and global

Per-user and per-model token tracking written on every request. Read your spend in the console, not on next month’s invoice.

Stop picking models. Send model: "auto".

New

The gateway inspects each request and routes to the cheapest model expected to produce a good answer, escalating up the ladder on rate limits, 5xx, or empty responses before you ever see an error. You're billed at the serving model's rate, and x-gateway-served-model names who answered.

curl https://agentbot.sh/v1/chat/completions \
  -H "authorization: Bearer ogw_live_..." \
  -H "content-type: application/json" \
  -d '{"model":"auto","route":{"priority":"cost","max_cost_usd":0.01},"messages":[{"role":"user","content":"hello"}]}'

Optional hint: { "priority": "cost" | "balanced" | "quality", "max_cost_usd": 0.01 } β€” stripped before it reaches any provider.

Your keys

No keys yet.

Global model usage

No gateway traffic yet.

Agent primitives

More than chat. A full agent-execution layer.

One key, one base URL β€” five OpenAI-style endpoints that the Agentbot stack runs on, open for yours too.

Inference

POST /v1/chat/completions

model:auto smart routing, provider failover, free MiMo.

Fast Apply

POST /v1/apply

Merge lazy code edits into full files with a fast model β€” no full rewrites.

Compaction

POST /v1/compact

Compress long conversations so 24/7 agents stay cheap and coherent.

Code Search

POST /v1/search

Rank the few relevant chunks from a corpus β€” agents skip the grep tax.

Planner

POST /v1/plan

Decompose a goal into specialized subtasks, each routed by model:auto.

A2A

POST /api/agents/:id/a2a

Discover, hire, and pay agents in USDC β€” JSON-RPC message/send + tasks/get.

Full reference: The Agentbot agent stack β†’

Next step

Generate your first key.

Sign in, create a key in seconds, and point your client at the gateway. Keys are shown once, stored hashed, revocable in one click.

ONLINE
Β© 2026 Agentbot