Open LLM Inference Gateway

Every agent.
One gateway.

The endpoint that powers every Agentbot container, the Playground, and the coding agent — open for your keys too. OpenAI-compatible, provider failover, usage you can read. Swap the base URL and ship.

Generate a key →Watch it work in the Playground

your client──▶agentbot /v1──▶vercel ai gateway⇢ failover ⇢openrouter──▶model

Gateway health

checking

Provider: not configured
Latency: -

Generate a key

Keys are shown once and stored hashed.

Quickstart

OpenAI-compatible — point any SDK at the gateway and keep your code.

curl https://agentbot.sh/v1/chat/completions \

  -H "authorization: Bearer ogw_live_..." \

  -H "content-type: application/json" \

  -d '{

    "model": "mimo-v2.5-pro",

    "messages": [

      {"role": "user", "content": "hello, gateway"}

}'

New to the gateway? Read how it works →

Your requests

Your tokens

Global requests

Global spend

$0.0000

One endpoint, many providers

Point any OpenAI-compatible client at /v1 and pass the model. Requests route through Vercel AI Gateway with OpenRouter failover — provider secrets stay server-side.

Smart routing — model: "auto"

Send model:"auto" and the gateway scores the request and routes to the cheapest capable model, escalating on failure. x-gateway-served-model names who answered.

Manage your own keys

Generate a key per project or environment, revoke instantly. Keys are shown once and stored as SHA-256 hashes — never raw.

Real-time usage, yours and global

Per-user and per-model token tracking written on every request. Read your spend in the console, not on next month’s invoice.

Stop picking models. Send `model: "auto"`.

New

The gateway inspects each request and routes to the cheapest model expected to produce a good answer, escalating up the ladder on rate limits, 5xx, or empty responses before you ever see an error. You're billed at the serving model's rate, and x-gateway-served-model names who answered.

curl https://agentbot.sh/v1/chat/completions \

  -H "authorization: Bearer ogw_live_..." \

  -H "content-type: application/json" \

  -d '{"model":"auto","route":{"priority":"cost","max_cost_usd":0.01},"messages":[{"role":"user","content":"hello"}]}'

Optional hint: { "priority": "cost" | "balanced" | "quality", "max_cost_usd": 0.01 } — stripped before it reaches any provider.

Your keys

No keys yet.

Global model usage

No gateway traffic yet.

Agent primitives

More than chat. A full agent-execution layer.

One key, one base URL — five OpenAI-style endpoints that the Agentbot stack runs on, open for yours too.

Inference

POST /v1/chat/completions

model:auto smart routing, provider failover, free MiMo.

Fast Apply

POST /v1/apply

Merge lazy code edits into full files with a fast model — no full rewrites.

Compaction

POST /v1/compact

Compress long conversations so 24/7 agents stay cheap and coherent.

Code Search

POST /v1/search

Rank the few relevant chunks from a corpus — agents skip the grep tax.

Planner

POST /v1/plan

Decompose a goal into specialized subtasks, each routed by model:auto.

A2A

POST /api/agents/:id/a2a

Discover, hire, and pay agents in USDC — JSON-RPC message/send + tasks/get.

Full reference: The Agentbot agent stack →

Already in production

Generate your first key.

Sign in, create a key in seconds, and point your client at the gateway. Keys are shown once, stored hashed, revocable in one click.

Open console →Read the field notes

Every agent.
One gateway.

Generate a key

Quickstart

One endpoint, many providers

Smart routing — model: "auto"

Manage your own keys

Real-time usage, yours and global

Stop picking models. Send `model: "auto"`.

Your keys

Global model usage

More than chat. A full agent-execution layer.

Inference

Fast Apply

Compaction

Code Search

Planner

A2A

You're not the first request through this pipe.

Playground →

Agent containers →

Coding agent →

Generate your first key.

Every agent.One gateway.

Generate a key

Quickstart

One endpoint, many providers

Smart routing — model: "auto"

Manage your own keys

Real-time usage, yours and global

Stop picking models. Send model: "auto".

Your keys

Global model usage

More than chat. A full agent-execution layer.

Inference

Fast Apply

Compaction

Code Search

Planner

A2A

You're not the first request through this pipe.

Playground →

Agent containers →

Coding agent →

Generate your first key.

Every agent.
One gateway.

Stop picking models. Send `model: "auto"`.