Get started with OpenRouter’s unified API for hundreds of AI models. Learn how to integrate using OpenAI SDK, direct API calls, or third-party frameworks.
OpenRouter provides a unified API that gives you access to hundreds of AI models through a single endpoint, while automatically handling fallbacks and selecting the most cost-effective options. Get started with just a few lines of code using your preferred SDK or framework.
In the examples below, the OpenRouter-specific headers are optional. Setting them allows your app to appear on the OpenRouter leaderboards.
client = OpenAI( base_url=“https://openrouter.ai/api/v1”, api_key=“<OPENROUTER_API_KEY>”, )
completion = client.chat.completions.create( extra_headers={ “HTTP-Referer”: “<YOUR_SITE_URL>”, # Optional. Site URL for rankings on openrouter.ai. “X-Title”: “<YOUR_SITE_NAME>”, # Optional. Site title for rankings on openrouter.ai. }, model=“openai/gpt-4o”, messages=[ { “role”: “user”, “content”: “What is the meaning of life?” } ] )
print(completion.choices[0].message.content)
```typescript title="TypeScript"
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: '<OPENROUTER_API_KEY>',
defaultHeaders: {
'HTTP-Referer': '<YOUR_SITE_URL>', // Optional. Site URL for rankings on openrouter.ai.
'X-Title': '<YOUR_SITE_NAME>', // Optional. Site title for rankings on openrouter.ai.
},
});
async function main() {
const completion = await openai.chat.completions.create({
model: 'openai/gpt-4o',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
});
console.log(completion.choices[0].message);
}
main();
response = requests.post( url=“https://openrouter.ai/api/v1/chat/completions”, headers={ “Authorization”: “Bearer <OPENROUTER_API_KEY>”, “HTTP-Referer”: “<YOUR_SITE_URL>”, # Optional. Site URL for rankings on openrouter.ai. “X-Title”: “<YOUR_SITE_NAME>”, # Optional. Site title for rankings on openrouter.ai. }, data=json.dumps({ “model”: “openai/gpt-4o”, # Optional “messages”: [ { “role”: “user”, “content”: “What is the meaning of life?” } ] }) )
```typescript title="TypeScript"
fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer <OPENROUTER_API_KEY>',
'HTTP-Referer': '<YOUR_SITE_URL>', // Optional. Site URL for rankings on openrouter.ai.
'X-Title': '<YOUR_SITE_NAME>', // Optional. Site title for rankings on openrouter.ai.
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'openai/gpt-4o',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
}),
});
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
]
}'
The API also supports streaming.
For information about using third-party SDKs and frameworks with OpenRouter, please see our frameworks documentation.
Find answers to commonly asked questions about OpenRouter’s unified API, model access, pricing, and integration.
OpenRouter passes through the pricing of the underlying providers, while pooling their uptime,
so you get the same pricing you'd get from the provider directly, with a
unified API and fallbacks so that you get much better uptime.
Once you have credits you can either use the chat room, or create API keys
and start using the API. You can read our [quickstart](/docs/quickstart)
guide for code samples and more.
When you make a request to OpenRouter, we receive the total number of tokens processed
by the provider. We then calculate the corresponding cost and deduct it from your credits.
You can review your complete usage history in the [Activity tab](https://openrouter.ai/activity).
We pass through the pricing of the underlying providers; there is no markup
on inference pricing (however we do charge a [fee](https://openrouter.ai/terms#_4_-payment) when purchasing credits).
Static variants can only be used with specific models and these are listed in our [models api](https://openrouter.ai/api/v1/models).
1. `:free` - The model is always provided for free and has low rate limits.
2. `:beta` - The model is not moderated by OpenRouter.
3. `:extended` - The model has longer than usual context length.
4. `:thinking` - The model supports reasoning by default.
Dynamic variants can be used on all models and they change the behavior of how the request is routed or used.
1. `:online` - All requests will run a query to extract web results that are attached to the prompt.
2. `:nitro` - Providers will be sorted by throughput rather than the default sort, optimizing for faster response times.
3. `:floor` - Providers will be sorted by price rather than the default sort, prioritizing the most cost-effective options.
1. Cookie-based authentication for the web interface and chatroom
2. API keys (passed as Bearer tokens) for accessing the completions API and other core endpoints
3. [Provisioning API keys](/docs/features/provisioning-api-keys) for programmatically managing API keys through the key management endpoints
Please see our Terms of Service and Privacy Policy.
We have an opt-in [setting](https://openrouter.ai/settings/privacy) that
lets users opt-in to log their prompts and completions in exchange for a 1%
discount on usage costs.
If you specify [provider routing](/docs/features/provider-routing) in your request, but none of the providers
match the level of privacy specified in your account settings, you will get an error and your request will not complete.
If you paid using crypto, please reach out to us on [Discord](https://discord.gg/fVyRaUDgxW)
and we will look into it.
We also provide a [credits api](/docs/api-reference/get-credits) that has
live information about the balance and remaining credits for the account.
Note that unused credits will be lost and cannot be reclaimed if you delete and later recreate your account.
Learn about OpenRouter’s guiding principles and mission. Understand our commitment to price optimization, standardized APIs, and high availability in AI model deployment.
OpenRouter helps developers source and optimize AI usage. We believe the future is multi-model and multi-provider.
Price and Performance. OpenRouter scouts for the best prices, the lowest latencies, and the highest throughput across dozens of providers, and lets you choose how to prioritize them.
Standardized API. No need to change code when switching between models or providers. You can even let your users choose and pay for their own.
Real-World Insights. Be the first to take advantage of new models. See real-world data of how often models are used for different purposes. Keep up to date in our Discord channel.
Consolidated Billing. Simple and transparent billing, regardless of how many providers you use.
Higher Availability. Fallback providers, and automatic, smart routing means your requests still work even when providers go down.
Higher Rate Limits. OpenRouter works directly with providers to provide better rate limits and more throughput.
Access over 300 AI models through OpenRouter’s unified API. Browse available models, compare capabilities, and integrate with your preferred provider.
OpenRouter strives to provide access to every potentially useful text-based AI model. We currently support over 300 models endpoints.
If there are models or providers you are interested in that OpenRouter doesn’t have, please tell us about them in our Discord channel.
Explore and browse 300+ models and providers on our website, or with our API.
If you’re interested in working with OpenRouter, you can learn more on our providers page.
Route requests dynamically between AI models. Learn how to use OpenRouter’s Auto Router and model fallback features for optimal performance and reliability.
OpenRouter provides two options for model routing.
The Auto Router, a special model ID that you can use to choose between selected high-quality models based on your prompt, powered by NotDiamond.
{
"model": "openrouter/auto",
... // Other params
}
The resulting generation will have model
set to the model that was used.
models
parameterThe models
parameter lets you automatically try other models if the primary model’s providers are down, rate-limited, or refuse to reply due to content moderation.
{
"models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
... // Other params
}
If the model you selected returns an error, OpenRouter will try to use the fallback model instead. If the fallback model is down or returns an error, OpenRouter will return that error.
By default, any error can trigger the use of a fallback model, including context length validation errors, moderation flags for filtered models, rate-limiting, and downtime.
Requests are priced using the model that was ultimately used, which will be returned in the model
attribute of the response body.
If no fallback model is specified but route: "fallback"
is included, OpenRouter will try the most appropriate open-source model available, with pricing less than the primary model (or very close to it).
To use the models
array with the OpenAI SDK, include it in the extra_body
parameter. In the example below, gpt-4o will be tried first, and the models
array will be tried in order as fallbacks.
import OpenAI from 'openai';
const openrouterClient = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
// API key and headers
});
async function main() {
// @ts-expect-error
const completion = await openrouterClient.chat.completions.create({
model: 'openai/gpt-4o',
models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
});
console.log(completion.choices[0].message);
}
main();
Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter’s provider routing.
OpenRouter routes requests to the best available providers for your model. By default, requests are load balanced across the top providers to maximize uptime.
You can customize how your requests are routed using the provider
object in the request body for Chat Completions and Completions.
The provider
object can contain the following fields:
Field | Type | Default | Description |
---|---|---|---|
order |
string[] | - | List of provider names to try in order (e.g. ["Anthropic", "OpenAI"] ). Learn more |
allow_fallbacks |
boolean | true |
Whether to allow backup providers when the primary is unavailable. Learn more |
require_parameters |
boolean | false |
Only use providers that support all parameters in your request. Learn more |
data_collection |
“allow” | “deny” | “allow” | Control whether to use providers that may store data. Learn more |
ignore |
string[] | - | List of provider names to skip for this request. Learn more |
quantizations |
string[] | - | List of quantization levels to filter by (e.g. ["int4", "int8"] ). Learn more |
sort |
string | - | Sort providers by price or throughput. (e.g. "price" or "throughput" ). Learn more |
For each model in your request, OpenRouter’s default behavior is to load balance requests across providers, prioritizing price.
If you are more sensitive to throughput than price, you can use the sort
field to explicitly prioritize throughput.
Here is OpenRouter’s default load balancing strategy:
If you have sort
or order
set in your provider preferences, load balancing will be disabled.
As described above, OpenRouter load balances based on price, while taking uptime into account.
If you instead want to explicitly prioritize a particular provider attribute, you can include the sort
field in the provider
preferences. Load balancing will be disabled, and the router will try providers in order.
The three sort options are:
"price"
: prioritize lowest price"throughput"
: prioritize highest throughput"latency"
: prioritize lowest latency<TSFetchCodeBlock title=“Example with Fallbacks Enabled” uriPath=“/api/v1/chat/completions” body={{ model: ‘meta-llama/llama-3.1-70b-instruct’, messages: [{ role: ‘user’, content: ‘Hello’ }], provider: { sort: ‘throughput’, }, }} />
To always prioritize low prices, and not apply any load balancing, set sort
to "price"
.
To always prioritize low latency, and not apply any load balancing, set sort
to "latency"
.
You can append :nitro
to any model slug as a shortcut to sort by throughput. This is exactly equivalent to setting provider.sort
to "throughput"
.
<TSFetchCodeBlock title=“Example using Nitro shortcut” uriPath=“/api/v1/chat/completions” body={{ model: ‘meta-llama/llama-3.1-70b-instruct:nitro’, messages: [{ role: ‘user’, content: ‘Hello’ }], }} />
You can append :floor
to any model slug as a shortcut to sort by price. This is exactly equivalent to setting provider.sort
to "price"
.
<TSFetchCodeBlock title=“Example using Floor shortcut” uriPath=“/api/v1/chat/completions” body={{ model: ‘meta-llama/llama-3.1-70b-instruct:floor’, messages: [{ role: ‘user’, content: ‘Hello’ }], }} />
You can set the providers that OpenRouter will prioritize for your request using the order
field.
Field | Type | Default | Description |
---|---|---|---|
order |
string[] | - | List of provider names to try in order (e.g. ["Anthropic", "OpenAI"] ). |
The router will prioritize providers in this list, and in this order, for the model you’re using. If you don’t set this field, the router will load balance across the top providers to maximize uptime.
OpenRouter will try them one at a time and proceed to other providers if none are operational. If you don’t want to allow any other providers, you should disable fallbacks as well.
This example skips over OpenAI (which doesn’t host Mixtral), tries Together, and then falls back to the normal list of providers on OpenRouter:
<TSFetchCodeBlock title=“Example with Fallbacks Enabled” uriPath=“/api/v1/chat/completions” body={{ model: ‘mistralai/mixtral-8x7b-instruct’, messages: [{ role: ‘user’, content: ‘Hello’ }], provider: { order: [‘OpenAI’, ‘Together’], }, }} />
Here’s an example with allow_fallbacks
set to false
that skips over OpenAI (which doesn’t host Mixtral), tries Together, and then fails if Together fails:
<TSFetchCodeBlock title=“Example with Fallbacks Disabled” uriPath=“/api/v1/chat/completions” body={{ model: ‘mistralai/mixtral-8x7b-instruct’, messages: [{ role: ‘user’, content: ‘Hello’ }], provider: { order: [‘OpenAI’, ‘Together’], allow_fallbacks: false, }, }} />
You can restrict requests only to providers that support all parameters in your request using the require_parameters
field.
Field | Type | Default | Description |
---|---|---|---|
require_parameters |
boolean | false |
Only use providers that support all parameters in your request. |
With the default routing strategy, providers that don’t support all the LLM parameters specified in your request can still receive the request, but will ignore unknown parameters. When you set require_parameters
to true
, the request won’t even be routed to that provider.
For example, to only use providers that support JSON formatting:
<TSFetchCodeBlock uriPath=“/api/v1/chat/completions” body={{ messages: [{ role: ‘user’, content: ‘Hello’ }], provider: { require_parameters: true, }, response_format: { type: ‘json_object’ }, }} />
You can restrict requests only to providers that comply with your data policies using the data_collection
field.
Field | Type | Default | Description |
---|---|---|---|
data_collection |
“allow” | “deny” | “allow” | Control whether to use providers that may store data. |
allow
: (default) allow providers which store user data non-transiently and may train on itdeny
: use only providers which do not collect user dataSome model providers may log prompts, so we display them with a Data Policy tag on model pages. This is not a definitive source of third party data policies, but represents our best knowledge.
To exclude providers that don’t comply with your data policies, set data_collection
to deny
:
<TSFetchCodeBlock uriPath=“/api/v1/chat/completions” body={{ messages: [{ role: ‘user’, content: ‘Hello’ }], provider: { data_collection: ‘deny’, // or “allow” }, }} />
To guarantee that your request is only served by the top (lowest-cost) provider, you can disable fallbacks.
This is combined with the order
field from Ordering Specific Providers to restrict the providers that OpenRouter will prioritize to just your chosen list.
<TSFetchCodeBlock uriPath=“/api/v1/chat/completions” body={{ messages: [{ role: ‘user’, content: ‘Hello’ }], provider: { allow_fallbacks: false, }, }} />
You can ignore providers for a request by setting the ignore
field in the provider
object.
Field | Type | Default | Description |
---|---|---|---|
ignore |
string[] | - | List of provider names to skip for this request. |
Note that when you ignore providers for a specific request, the list of ignored providers is merged with your account-wide ignored providers.
Here’s an example that will ignore Azure for a request calling GPT-4 Omni:
<TSFetchCodeBlock uriPath=“/api/v1/chat/completions” body={{ model: ‘openai/gpt-4o’, messages: [{ role: ‘user’, content: ‘Hello’ }], provider: { ignore: [‘Azure’], }, }} />
Quantization reduces model size and computational requirements while aiming to preserve performance. Most LLMs today use FP16 or BF16 for training and inference, cutting memory requirements in half compared to FP32. Some optimizations use FP8 or quantization to reduce size further (e.g., INT8, INT4).
Field | Type | Default | Description |
---|---|---|---|
quantizations |
string[] | - | List of quantization levels to filter by (e.g. ["int4", "int8"] ). Learn more |
Providers can support various quantization levels for open-weight models.
By default, requests are load-balanced across all available providers, ordered by price. To filter providers by quantization level, specify the quantizations
field in the provider
parameter with the following values:
int4
: Integer (4 bit)int8
: Integer (8 bit)fp4
: Floating point (4 bit)fp6
: Floating point (6 bit)fp8
: Floating point (8 bit)fp16
: Floating point (16 bit)bf16
: Brain floating point (16 bit)fp32
: Floating point (32 bit)unknown
: UnknownHere’s an example that will only use providers that support FP8 quantization:
<TSFetchCodeBlock uriPath=“/api/v1/chat/completions” body={{ model: ‘meta-llama/llama-3.1-8b-instruct’, messages: [{ role: ‘user’, content: ‘Hello’ }], provider: { quantizations: [‘fp8’], }, }} />
You can view the terms of service for each provider below. You may not violate the terms of service or policies of third-party providers that power the models on OpenRouter.
OpenAI
: https://openai.com/policies/row-terms-of-use/Anthropic
: https://www.anthropic.com/legal/commercial-termsGoogle Vertex
: https://cloud.google.com/terms/Google AI Studio
: https://cloud.google.com/terms/Amazon Bedrock
: https://aws.amazon.com/service-terms/Groq
: https://groq.com/terms-of-use/SambaNova
: https://sambanova.ai/terms-and-conditionsCohere
: https://cohere.com/terms-of-useMistral
: https://mistral.ai/terms/#terms-of-useTogether
: https://www.together.ai/terms-of-serviceTogether (lite)
: https://www.together.ai/terms-of-serviceFireworks
: https://fireworks.ai/terms-of-serviceDeepInfra
: https://deepinfra.com/docs/dataLepton
: https://www.lepton.ai/policies/tosNovitaAI
: https://novita.ai/legal/terms-of-serviceAvian.io
: https://avian.io/privacyLambda
: https://lambdalabs.com/legal/privacy-policyAzure
: https://www.microsoft.com/en-us/legal/terms-of-use?oneroute=trueModal
: https://modal.com/legal/termsAnyScale
: https://www.anyscale.com/termsReplicate
: https://replicate.com/termsPerplexity
: https://www.perplexity.ai/hub/legal/perplexity-api-terms-of-serviceRecursal
: https://featherless.ai/termsOctoAI
: https://octo.ai/docs/faqs/privacy-and-securityDeepSeek
: https://chat.deepseek.com/downloads/DeepSeek%20Terms%20of%20Use.htmlInfermatic
: https://infermatic.ai/privacy-policy/AI21
: https://studio.ai21.com/privacy-policyFeatherless
: https://featherless.ai/termsInflection
: https://developers.inflection.ai/tosxAI
: https://x.ai/legal/terms-of-serviceCloudflare
: https://www.cloudflare.com/service-specific-terms-developer-platform/#developer-platform-termsSF Compute
: https://inference.sfcompute.com/privacyMinimax
: https://intl.minimaxi.com/protocol/terms-of-serviceNineteen
: https://nineteen.ai/tosLiquid
: https://www.liquid.ai/terms-conditionsinference.net
: https://inference.net/termsFriendli
: https://friendli.ai/terms-of-serviceAionLabs
: https://www.aionlabs.ai/terms/Alibaba
: https://www.alibabacloud.com/help/en/legal/latest/alibaba-cloud-international-website-product-terms-of-service-v-3-8-0Nebius AI Studio
: https://docs.nebius.com/legal/studio/terms-of-use/Chutes
: https://chutes.ai/toskluster.ai
: https://www.kluster.ai/terms-of-useCrusoe
: https://legal.crusoe.ai/open-router#managed-inference-tos-open-routerTargon
: https://targon.com/termsUbicloud
: https://www.ubicloud.com/docs/about/terms-of-serviceParasail
: https://www.parasail.io/legal/terms01.AI
: https://platform.01.ai/privacypolicyHuggingFace
: https://huggingface.co/terms-of-serviceMancer
: https://mancer.tech/termsMancer (private)
: https://mancer.tech/termsHyperbolic
: https://hyperbolic.xyz/privacyHyperbolic (quantized)
: https://hyperbolic.xyz/privacyLynn
: https://api.lynn.app/policyFor a complete list of options, see this JSON schema:
Reduce your AI model costs with OpenRouter’s prompt caching feature. Learn how to cache and reuse responses across OpenAI, Anthropic Claude, and DeepSeek models.
To save on inference costs, you can enable prompt caching on supported providers and models.
Most providers automatically enable prompt caching, but note that some (see Anthropic below) require you to enable it on a per-message basis. Note that prompt caching does not work when switching between providers. In order to cache the prompt, LLM engines must store a memory snapshot of the processed prompt, which is not shared with other providers.
To see how much caching saved on each generation, you click the detail button on the Activity page, or you can use the /api/v1/generation
API, documented here.
The cache_discount
field in the response body will tell you how much the response saved on cache usage. Some providers, like Anthropic, will have a negative discount on cache writes, but a positive discount (which reduces total cost) on cache reads.
Caching price changes:
Prompt caching with OpenAI is automated and does not require any additional configuration. There is a minimum prompt size of 1024 tokens.
Click here to read more about OpenAI prompt caching and its limitation.
Caching price changes:
Prompt caching with Anthropic requires the use of cache_control
breakpoints. There is a limit of four breakpoints, and the cache will expire within five minutes. Therefore, it is recommended to reserve the cache breakpoints for large bodies of text, such as character cards, CSV data, RAG data, book chapters, etc.
Click here to read more about Anthropic prompt caching and its limitation.
The cache_control
breakpoint can only be inserted into the text part of a multipart message.
System message caching example:
{
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a historian studying the fall of the Roman Empire. You know the following book very well:"
},
{
"type": "text",
"text": "HUGE TEXT BODY",
"cache_control": {
"type": "ephemeral"
}
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What triggered the collapse?"
}
]
}
]
}
User message caching example:
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Given the book below:"
},
{
"type": "text",
"text": "HUGE TEXT BODY",
"cache_control": {
"type": "ephemeral"
}
},
{
"type": "text",
"text": "Name all the characters in the above book"
}
]
}
]
}
Caching price changes:
Prompt caching with DeepSeek is automated and does not require any additional configuration.
Enforce JSON Schema validation on AI model responses. Get consistent, type-safe outputs and avoid parsing errors with OpenRouter’s structured output feature.
OpenRouter supports structured outputs for compatible models, ensuring responses follow a specific JSON Schema format. This feature is particularly useful when you need consistent, well-formatted responses that can be reliably parsed by your application.
Structured outputs allow you to:
To use structured outputs, include a response_format
parameter in your request, with type
set to json_schema
and the json_schema
object containing your schema:
{
"messages": [
{ "role": "user", "content": "What's the weather like in London?" }
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "weather",
"strict": true,
"schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City or location name"
},
"temperature": {
"type": "number",
"description": "Temperature in Celsius"
},
"conditions": {
"type": "string",
"description": "Weather conditions description"
}
},
"required": ["location", "temperature", "conditions"],
"additionalProperties": false
}
}
}
}
The model will respond with a JSON object that strictly follows your schema:
{
"location": "London",
"temperature": 18,
"conditions": "Partly cloudy with light drizzle"
}
Structured outputs are supported by select models.
You can find a list of models that support structured outputs on the models page.
To ensure your chosen model supports structured outputs:
require_parameters: true
in your provider preferences (see Provider Routing)response_format
and set type: json_schema
in the required parametersInclude descriptions: Add clear descriptions to your schema properties to guide the model
Use strict mode: Always set strict: true
to ensure the model follows your schema exactly
Here’s a complete example using the Fetch API:
<TemplatedCodeBlocks templates={{ API_KEY_REF }}>
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer {{API_KEY_REF}}',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'openai/gpt-4',
messages: [
{ role: 'user', content: 'What is the weather like in London?' },
],
response_format: {
type: 'json_schema',
json_schema: {
name: 'weather',
strict: true,
schema: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City or location name',
},
temperature: {
type: 'number',
description: 'Temperature in Celsius',
},
conditions: {
type: 'string',
description: 'Weather conditions description',
},
},
required: ['location', 'temperature', 'conditions'],
additionalProperties: false,
},
},
},
}),
});
const data = await response.json();
const weatherInfo = data.choices[0].message.content;
Structured outputs are also supported with streaming responses. The model will stream valid partial JSON that, when complete, forms a valid response matching your schema.
To enable streaming with structured outputs, simply add stream: true
to your request:
{
"stream": true,
"response_format": {
"type": "json_schema",
// ... rest of your schema
}
}
When using structured outputs, you may encounter these scenarios:
Use tools (or functions) in your prompts with OpenRouter. Learn how to use tools with OpenAI, Anthropic, and other models that support tool calling.
Tool calls (also known as function calls) give an LLM access to external tools. The LLM does not call the tools directly. Instead, it suggests the tool to call. The user then calls the tool separately and provides the results back to the LLM. Finally, the LLM formats the response into an answer to the user’s original question.
OpenRouter standardizes the tool calling interface across models and providers.
For a primer on how tool calling works in the OpenAI SDK, please see this article, or if you prefer to learn from a full end-to-end example, keep reading.
Here is Python code that gives LLMs the ability to call an external API – in this case Project Gutenberg, to search for books.
First, let’s do some basic setup:
import json, requests
from openai import OpenAI
OPENROUTER_API_KEY = "..."
# You can use any model that supports tool calling
MODEL = 'google/gemini-2.0-flash-001'
openai_client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=OPENROUTER_API_KEY,
)
task = "What are the titles of some James Joyce books?"
messages = [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": task,
}
]
Next, we define the tool that we want to call. Remember, the tool is going to get requested by the LLM, but the code we are writing here is ultimately responsible for executing the call and returning the results to the LLM.
def search_gutenberg_books(search_terms):
search_query = " ".join(search_terms)
url = "https://gutendex.com/books"
response = requests.get(url, params={"search": search_query})
simplified_results = []
for book in response.json().get("results", []):
simplified_results.append({
"id": book.get("id"),
"title": book.get("title"),
"authors": book.get("authors")
})
return simplified_results
tools = [
{
"type": "function",
"function": {
"name": "search_gutenberg_books",
"description": "Search for books in the Project Gutenberg library based on specified search terms",
"parameters": {
"type": "object",
"properties": {
"search_terms": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of search terms to find books in the Gutenberg library (e.g. ['dickens', 'great'] to search for books by Dickens with 'great' in the title)"
}
},
"required": ["search_terms"]
}
}
}
]
TOOL_MAPPING = {
"search_gutenberg_books": search_gutenberg_books
}
Note that the “tool” is just a normal Python function. We then write a JSON ‘spec’ compatible with the OpenAI function calling spec. We’ll pass that spec to the LLM so that it knows this tool is available and how to use it. It will request the tool when needed, along with any arguments. We’ll then marshal the tool call locally, make the function call, and return the results to the LLM.
Let’s make the first OpenRouter API call to the model:
request_1 = {
"model": MODEL,
"tools": tools,
"messages": messages
}
response_1 = openai_client.chat.completions.create(**request_1).message
The LLM responds with a finish reason of tool_calls, and a tool_calls array. In a generic LLM response-handler, you would want to check the finish reason before processing tool calls, but here we will assume it’s the case. Let’s keep going, by processing the tool call:
# Append the response to the messages array so the LLM has the full context
# It's easy to forget this step!
messages.append(response_1)
# Now we process the requested tool calls, and use our book lookup tool
for tool_call in response_1.tool_calls:
'''
In this case we only provided one tool, so we know what function to call.
When providing multiple tools, you can inspect `tool_call.function.name`
to figure out what function you need to call locally.
'''
tool_name = tool_call.function.name
tool_args = json.loads(tool_call.function.arguments)
tool_response = TOOL_MAPPING[tool_name](**tool_args)
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_name,
"content": json.dumps(tool_response),
}
)
The messages array now has:
Now, we can make a second OpenRouter API call, and hopefully get our result!
request_2 = {
"model": MODEL,
"messages": messages
}
response_2 = openai_client.chat.completions.create(**request_2)
print(response_2.choices[0].message.content)
The output will be something like:
Here are some books by James Joyce:
* *Ulysses*
* *Dubliners*
* *A Portrait of the Artist as a Young Man*
* *Chamber Music*
* *Exiles: A Play in Three Acts*
We did it! We’ve successfully used a tool in a prompt.
In the example above, the calls are made explicitly and sequentially. To handle a wide variety of user inputs and tool calls, you can use an agentic loop.
Here’s an example of a simple agentic loop (using the same tools
and initial messages
as above):
def call_llm(msgs):
resp = openai_client.chat.completions.create(
model=MODEL,
tools=tools,
messages=msgs
)
msgs.append(resp.choices[0].message.dict())
return resp
def get_tool_response(response):
tool_call = response.choices[0].message.tool_calls[0]
tool_name = tool_call.function.name
tool_args = json.loads(tool_call.function.arguments)
# Look up the correct tool locally, and call it with the provided arguments
# Other tools can be added without changing the agentic loop
tool_result = TOOL_MAPPING[tool_name](**tool_args)
return {
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_name,
"content": tool_result,
}
while True:
resp = call_llm(_messages)
if resp.choices[0].message.tool_calls is not None:
messages.append(get_tool_response(resp))
else:
break
print(messages[-1]['content'])
Transform and optimize messages before sending them to AI models. Learn about middle-out compression and context window optimization with OpenRouter.
To help with prompts that exceed the maximum context size of a model, OpenRouter supports a custom parameter called transforms
:
{
transforms: ["middle-out"], // Compress prompts that are > context size.
messages: [...],
model // Works with any model
}
This can be useful for situations where perfect recall is not required. The transform works by removing or truncating messages from the middle of the prompt, until the prompt fits within the model’s context window.
In some cases, the issue is not the token context length, but the actual number of messages. The transform addresses this as well: For instance, Anthropic’s Claude models enforce a maximum of {anthropicMaxMessagesCount} messages. When this limit is exceeded with middle-out enabled, the transform will keep half of the messages from the start and half from the end of the conversation.
When middle-out compression is enabled, OpenRouter will first try to find models whose context length is at least half of your total required tokens (input + completion). For example, if your prompt requires 10,000 tokens total, models with at least 5,000 context length will be considered. If no models meet this criteria, OpenRouter will fall back to using the model with the highest available context length.
The compression will then attempt to fit your content within the chosen model’s context window by removing or truncating content from the middle of the prompt. If middle-out compression is disabled and your total tokens exceed the model’s context length, the request will fail with an error message suggesting you either reduce the length or enable middle-out compression.
The middle of the prompt is compressed because LLMs pay less attention to the middle of sequences.
Learn how OpenRouter maximizes AI model uptime through real-time monitoring, intelligent routing, and automatic fallbacks across multiple providers.
OpenRouter continuously monitors the health and availability of AI providers to ensure maximum uptime for your applications. We track response times, error rates, and availability across all providers in real-time, and route based on this feedback.
OpenRouter tracks response times, error rates, and availability across all providers in real-time. This data helps us make intelligent routing decisions and provides transparency about service reliability.
While our smart routing helps maintain high availability, you can also customize provider selection using request parameters. This gives you control over which providers handle your requests while still benefiting from automatic fallback when needed.
Learn more about customizing provider selection in our Provider Routing documentation.
Enable real-time web search capabilities in your AI model responses. Add factual, up-to-date information to any model’s output with OpenRouter’s web search feature.
You can incorporate relevant web search results for any model on OpenRouter by activating and customizing the web
plugin, or by appending :online
to the model slug:
{
"model": "openai/gpt-4o:online"
}
This is a shortcut for using the web
plugin, and is exactly equivalent to:
{
"model": "openrouter/auto",
"plugins": [{ "id": "web" }]
}
The web search plugin is powered by Exa and uses their “auto” method (a combination of keyword search and embeddings-based web search) to find the most relevant results and augment/ground your prompt.
The maximum results allowed by the web plugin and the prompt used to attach them to your message stream can be customized:
{
"model": "openai/gpt-4o:online",
"plugins": [
{
"id": "web",
"max_results": 1, // Defaults to 5
"search_prompt": "Some relevant web results:" // See default below
}
]
}
By default, the web plugin uses the following search prompt, using the current date:
A web search was conducted on `date`. Incorporate the following web search results into your response.
IMPORTANT: Cite them using markdown links named using the domain of the source.
Example: [nytimes.com](https://nytimes.com/some-page).
The web plugin uses your OpenRouter credits and charges $4 per 1000 results. By default, max_results
set to 5, this comes out to a maximum of $0.02 per request, in addition to the LLM usage for the search result prompt tokens.
Manage OpenRouter API keys programmatically through dedicated management endpoints. Create, read, update, and delete API keys for automated key distribution and control.
OpenRouter provides endpoints to programmatically manage your API keys, enabling key creation and management for applications that need to distribute or rotate keys automatically.
To use the key management API, you first need to create a Provisioning API key:
Provisioning keys cannot be used to make API calls to OpenRouter’s completion endpoints - they are exclusively for key management operations.
Common scenarios for programmatic key management include:
All key management endpoints are under /api/v1/keys
and require a Provisioning API key in the Authorization header.
PROVISIONING_API_KEY = “your-provisioning-key” BASE_URL = “https://openrouter.ai/api/v1/keys”
response = requests.get( BASE_URL, headers={ “Authorization”: f"Bearer {PROVISIONING_API_KEY}", “Content-Type”: “application/json” } )
response = requests.post( f"{BASE_URL}/“, headers={ “Authorization”: f"Bearer {PROVISIONING_API_KEY}”, “Content-Type”: “application/json” }, json={ “name”: “Customer Instance Key”, “label”: “customer-123”, “limit”: 1000 # Optional credit limit } )
key_hash = “<YOUR_KEY_HASH>” response = requests.get( f"{BASE_URL}/{key_hash}“, headers={ “Authorization”: f"Bearer {PROVISIONING_API_KEY}”, “Content-Type”: “application/json” } )
response = requests.patch( f"{BASE_URL}/{key_hash}“, headers={ “Authorization”: f"Bearer {PROVISIONING_API_KEY}”, “Content-Type”: “application/json” }, json={ “name”: “Updated Key Name”, “disabled”: True # Disable the key } )
response = requests.delete( f"{BASE_URL}/{key_hash}“, headers={ “Authorization”: f"Bearer {PROVISIONING_API_KEY}”, “Content-Type”: “application/json” } )
```typescript title="TypeScript"
const PROVISIONING_API_KEY = 'your-provisioning-key';
const BASE_URL = 'https://openrouter.ai/api/v1/keys';
// List the most recent 100 API keys
const listKeys = await fetch(BASE_URL, {
headers: {
'Authorization': `Bearer ${PROVISIONING_API_KEY}`,
'Content-Type': 'application/json',
},
});
// You can paginate using the `offset` query parameter
const listKeys = await fetch(`${BASE_URL}?offset=100`, {
headers: {
'Authorization': `Bearer ${PROVISIONING_API_KEY}`,
'Content-Type': 'application/json',
},
});
// Create a new API key
const createKey = await fetch(`${BASE_URL}`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${PROVISIONING_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
name: 'Customer Instance Key',
label: 'customer-123',
limit: 1000, // Optional credit limit
}),
});
// Get a specific key
const keyHash = '<YOUR_KEY_HASH>';
const getKey = await fetch(`${BASE_URL}/${keyHash}`, {
headers: {
'Authorization': `Bearer ${PROVISIONING_API_KEY}`,
'Content-Type': 'application/json',
},
});
// Update a key
const updateKey = await fetch(`${BASE_URL}/${keyHash}`, {
method: 'PATCH',
headers: {
'Authorization': `Bearer ${PROVISIONING_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
name: 'Updated Key Name',
disabled: true, // Disable the key
}),
});
// Delete a key
const deleteKey = await fetch(`${BASE_URL}/${keyHash}`, {
method: 'DELETE',
headers: {
'Authorization': `Bearer ${PROVISIONING_API_KEY}`,
'Content-Type': 'application/json',
},
});
API responses return JSON objects containing key information:
{
"data": [
{
"created_at": "2025-02-19T20:52:27.363244+00:00",
"updated_at": "2025-02-19T21:24:11.708154+00:00",
"hash": "<YOUR_KEY_HASH>",
"label": "sk-or-v1-customkey",
"name": "Customer Key",
"disabled": false,
"limit": 10,
"usage": 0
}
]
}
When creating a new key, the response will include the key string itself.
Comprehensive guide to OpenRouter’s API. Learn about request/response schemas, authentication, parameters, and integration with multiple AI model providers.
OpenRouter’s request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, OpenRouter normalizes the schema across models and providers so you only need to learn one.
Here is the request schema as a TypeScript type. This will be the body of your POST
request to the /api/v1/chat/completions
endpoint (see the quick start above for an example).
For a complete list of parameters, see the Parameters.
// If "model" is unspecified, uses the user's default
model?: string; // See "Supported Models" section
// Allows to force the model to produce specific output format.
// See models page and note on this docs page for which models support it.
response_format?: { type: 'json_object' };
stop?: string | string[];
stream?: boolean; // Enable streaming
// See LLM Parameters (openrouter.ai/docs/api-reference/parameters)
max_tokens?: number; // Range: [1, context_length)
temperature?: number; // Range: [0, 2]
// Tool calling
// Will be passed down as-is for providers implementing OpenAI's interface.
// For providers with custom interfaces, we transform and map the properties.
// Otherwise, we transform the tools into a YAML template. The model responds with an assistant message.
// See models supporting tool calling: openrouter.ai/models?supported_parameters=tools
tools?: Tool[];
tool_choice?: ToolChoice;
// Advanced optional parameters
seed?: number; // Integer only
top_p?: number; // Range: (0, 1]
top_k?: number; // Range: [1, Infinity) Not available for OpenAI models
frequency_penalty?: number; // Range: [-2, 2]
presence_penalty?: number; // Range: [-2, 2]
repetition_penalty?: number; // Range: (0, 2]
logit_bias?: { [key: number]: number };
top_logprobs: number; // Integer only
min_p?: number; // Range: [0, 1]
top_a?: number; // Range: [0, 1]
// Reduce latency by providing the model with a predicted output
// https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs
prediction?: { type: 'content'; content: string };
// OpenRouter-only parameters
// See "Prompt Transforms" section: openrouter.ai/docs/transforms
transforms?: string[];
// See "Model Routing" section: openrouter.ai/docs/model-routing
models?: string[];
route?: 'fallback';
// See "Provider Routing" section: openrouter.ai/docs/provider-routing
provider?: ProviderPreferences;
};
// Subtypes:
type TextContent = { type: ‘text’; text: string; };
type ImageContentPart = { type: ‘image_url’; image_url: { url: string; // URL or base64 encoded image data detail?: string; // Optional, defaults to “auto” }; };
type ContentPart = TextContent | ImageContentPart;
type Message =
| {
role: ‘user’ | ‘assistant’ | ‘system’;
// ContentParts are only for the “user” role:
content: string | ContentPart[];
// If “name” is included, it will be prepended like this
// for non-OpenAI models: {name}: {content}
name?: string;
}
| {
role: ‘tool’;
content: string;
tool_call_id: string;
name?: string;
};
type FunctionDescription = { description?: string; name: string; parameters: object; // JSON Schema object };
type Tool = { type: ‘function’; function: FunctionDescription; };
type ToolChoice = | ‘none’ | ‘auto’ | { type: ‘function’; function: { name: string; }; };
</CodeBlocks>
The `response_format` parameter ensures you receive a structured response from the LLM. The parameter is only supported by OpenAI models, Nitro models, and some others - check the providers on the model page on openrouter.ai/models to see if it's supported, and set `require_parameters` to true in your Provider Preferences. See [Provider Routing](/docs/features/provider-routing)
### Headers
OpenRouter allows you to specify some optional headers to identify your app and make it discoverable to users on our site.
* `HTTP-Referer`: Identifies your app on openrouter.ai
* `X-Title`: Sets/modifies your app's title
<CodeBlocks>
```typescript title="TypeScript"
fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer <OPENROUTER_API_KEY>',
'HTTP-Referer': '<YOUR_SITE_URL>', // Optional. Site URL for rankings on openrouter.ai.
'X-Title': '<YOUR_SITE_NAME>', // Optional. Site title for rankings on openrouter.ai.
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'openai/gpt-4o',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
}),
});
OpenRouter supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.
To use this features, simply include a message with role: "assistant"
at the end of your messages
array.
Multimodal requests are only available via the /api/v1/chat/completions
API with a multi-part messages
parameter. The image_url
can either be a URL or a data-base64 encoded image.
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
]
Sample LLM response:
{
"choices": [
{
"role": "assistant",
"content": "This image depicts a scenic natural landscape featuring a long wooden boardwalk that stretches out through an expansive field of green grass. The boardwalk provides a clear path and invites exploration through the lush environment. The scene is surrounded by a variety of shrubbery and trees in the background, indicating a diverse plant life in the area."
}
]
}
For locally stored images, you can send them to the model using base64 encoding. Here’s an example:
import { readFile } from "fs/promises";
const getFlowerImage = async (): Promise<string> => {
const imagePath = new URL("flower.jpg", import.meta.url);
const imageBuffer = await readFile(imagePath);
const base64Image = imageBuffer.toString("base64");
return `data:image/jpeg;base64,${base64Image}`;
};
...
"messages": [
{
role: "user",
content: [
{
type: "text",
text: "What's in this image?",
},
{
type: "image_url",
image_url: {
url: `${await getFlowerImage()}`,
},
},
],
},
];
When sending data-base64 string, ensure it contains the content-type of the image. Example:
Supported content types are:
image/png
image/jpeg
image/webp
OpenRouter normalizes the schema across models and providers to comply with the OpenAI Chat API.
This means that choices
is always an array, even if the model only returns one completion. Each choice will contain a delta
property if a stream was requested and a message
property otherwise. This makes it easier to use the same code for all models.
Here’s the response schema as a TypeScript type:
// Definitions of subtypes are below
type Response = {
id: string;
// Depending on whether you set "stream" to "true" and
// whether you passed in "messages" or a "prompt", you
// will get a different output shape
choices: (NonStreamingChoice | StreamingChoice | NonChatChoice)[];
created: number; // Unix timestamp
model: string;
object: 'chat.completion' | 'chat.completion.chunk';
system_fingerprint?: string; // Only present if the provider supports it
// Usage data is always returned for non-streaming.
// When streaming, you will get one usage object at
// the end accompanied by an empty choices array.
usage?: ResponseUsage;
};
// If the provider returns usage, we pass it down
// as-is. Otherwise, we count using the GPT-4 tokenizer.
type ResponseUsage = {
/** Including images and tools if any */
prompt_tokens: number;
/** The tokens generated */
completion_tokens: number;
/** Sum of the above two fields */
total_tokens: number;
};
// Subtypes:
type NonChatChoice = {
finish_reason: string | null;
text: string;
error?: ErrorResponse;
};
type NonStreamingChoice = {
finish_reason: string | null;
native_finish_reason: string | null;
message: {
content: string | null;
role: string;
tool_calls?: ToolCall[];
};
error?: ErrorResponse;
};
type StreamingChoice = {
finish_reason: string | null;
native_finish_reason: string | null;
delta: {
content: string | null;
role?: string;
tool_calls?: ToolCall[];
};
error?: ErrorResponse;
};
type ErrorResponse = {
code: number; // See "Error Handling" section
message: string;
metadata?: Record<string, unknown>; // Contains additional error information such as provider details, the raw error message, etc.
};
type ToolCall = {
id: string;
type: 'function';
function: FunctionCall;
};
Here’s an example:
{
"id": "gen-xxxxxxxxxxxxxx",
"choices": [
{
"finish_reason": "stop", // Normalized finish_reason
"native_finish_reason": "stop", // The raw finish_reason from the provider
"message": {
// will be "delta" if streaming
"role": "assistant",
"content": "Hello there!"
}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 4,
"total_tokens": 4
},
"model": "openai/gpt-3.5-turbo" // Could also be "anthropic/claude-2.1", etc, depending on the "model" that ends up being used
}
OpenRouter normalizes each model’s finish_reason
to one of the following values: tool_calls
, stop
, length
, content_filter
, error
.
Some models and providers may have additional finish reasons. The raw finish_reason string returned by the model is available via the native_finish_reason
property.
The token counts that are returned in the completions API response are not counted via the model’s native tokenizer. Instead it uses a normalized, model-agnostic count (accomplished via the GPT4o tokenizer). This is because some providers do not reliably return native token counts. This behavior is becoming more rare, however, and we may add native token counts to the response object in the future.
Credit usage and model pricing are based on the native token counts (not the ‘normalized’ token counts returned in the API response).
For precise token accounting using the model’s native tokenizer, you can retrieve the full generation information via the /api/v1/generation
endpoint.
You can use the returned id
to query for the generation stats (including token counts and cost) after the request is complete. This is how you can get the cost and tokens for all models and requests, streaming and non-streaming.
const stats = await generation.json();
</CodeBlocks>
```json Example response
{
"data": {
"id": "gen-nNPYi0ZB6GOK5TNCUMHJGgXo",
"model": "openai/gpt-4-32k",
"streamed": false,
"generation_time": 2,
"tokens_prompt": 24,
"tokens_completion": 29,
"total_cost": 0.00492
// ... additional stats
}
}
Note that token counts are also available in the usage
field of the response body for non-streaming completions.
Learn how to implement streaming responses with OpenRouter’s API. Complete guide to Server-Sent Events (SSE) and real-time model outputs.
The OpenRouter API allows streaming responses from any model. This is useful for building chat interfaces or other applications where the UI should update as the model generates the response.
To enable streaming, you can set the stream
parameter to true
in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.
Here is an example of how to stream a response, and process it:
<TemplatedCodeBlocks templates={{ API_KEY_REF, MODEL: Model.GPT_4_Omni }}
import requests
import json
question = "How would you build the tallest building ever?"
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {{API_KEY_REF}}",
"Content-Type": "application/json"
}
payload = {
"model": "{{MODEL}}",
"messages": [{"role": "user", "content": question}],
"stream": True
}
buffer = ""
with requests.post(url, headers=headers, json=payload, stream=True) as r:
for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
buffer += chunk
while True:
try:
# Find the next complete SSE line
line_end = buffer.find('\n')
if line_end == -1:
break
line = buffer[:line_end].strip()
buffer = buffer[line_end + 1:]
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
try:
data_obj = json.loads(data)
content = data_obj["choices"][0]["delta"].get("content")
if content:
print(content, end="", flush=True)
except json.JSONDecodeError:
pass
except Exception:
break
const question = 'How would you build the tallest building ever?';
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: `Bearer ${{{API_KEY_REF}}}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [{ role: 'user', content: question }],
stream: true,
}),
});
const reader = response.body?.getReader();
if (!reader) {
throw new Error('Response body is not readable');
}
const decoder = new TextDecoder();
let buffer = '';
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Append new chunk to buffer
buffer += decoder.decode(value, { stream: true });
// Process complete lines from buffer
while (true) {
const lineEnd = buffer.indexOf('\n');
if (lineEnd === -1) break;
const line = buffer.slice(0, lineEnd).trim();
buffer = buffer.slice(lineEnd + 1);
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const parsed = JSON.parse(data);
const content = parsed.choices[0].delta.content;
if (content) {
console.log(content);
}
} catch (e) {
// Ignore invalid JSON
}
}
}
}
} finally {
reader.cancel();
}
For SSE (Server-Sent Events) streams, OpenRouter occasionally sends comments to prevent connection timeouts. These comments look like:
: OPENROUTER PROCESSING
Comment payload can be safely ignored per the SSE specs. However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.
Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify
the non-JSON payloads. We recommend the following clients:
Streaming requests can be cancelled by aborting the connection. For supported providers, this immediately stops model processing and billing.
Not Currently Supported
To implement stream cancellation:
<TemplatedCodeBlocks templates={{ API_KEY_REF, MODEL: Model.GPT_4_Omni }}
import requests
from threading import Event, Thread
def stream_with_cancellation(prompt: str, cancel_event: Event):
with requests.Session() as session:
response = session.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={"Authorization": f"Bearer {{API_KEY_REF}}"},
json={"model": "{{MODEL}}", "messages": [{"role": "user", "content": prompt}], "stream": True},
stream=True
)
try:
for line in response.iter_lines():
if cancel_event.is_set():
response.close()
return
if line:
print(line.decode(), end="", flush=True)
finally:
response.close()
# Example usage:
cancel_event = Event()
stream_thread = Thread(target=lambda: stream_with_cancellation("Write a story", cancel_event))
stream_thread.start()
# To cancel the stream:
cancel_event.set()
const controller = new AbortController();
try {
const response = await fetch(
'https://openrouter.ai/api/v1/chat/completions',
{
method: 'POST',
headers: {
Authorization: `Bearer ${{{API_KEY_REF}}}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [{ role: 'user', content: 'Write a story' }],
stream: true,
}),
signal: controller.signal,
},
);
// Process the stream...
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream cancelled');
} else {
throw error;
}
}
// To cancel the stream:
controller.abort();
Learn how to authenticate with OpenRouter using API keys and Bearer tokens. Complete guide to secure authentication methods and best practices.
You can cover model costs with OpenRouter API keys.
Our API authenticates requests using Bearer tokens. This allows you to use curl
or the OpenAI SDK directly with OpenRouter.
They allow users to set credit limits for apps, and they can be used in OAuth flows.
To use an API key, first create your key. Give it a name and you can optionally set a credit limit.
If you’re calling the OpenRouter API directly, set the Authorization
header to a Bearer token with your API key.
If you’re using the OpenAI Typescript SDK, set the api_base
to https://openrouter.ai/api/v1
and the apiKey
to your API key.
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: '<OPENROUTER_API_KEY>',
defaultHeaders: {
'HTTP-Referer': '<YOUR_SITE_URL>', // Optional. Site URL for rankings on openrouter.ai.
'X-Title': '<YOUR_SITE_NAME>', // Optional. Site title for rankings on openrouter.ai.
},
});
async function main() {
const completion = await openai.chat.completions.create({
model: 'openai/gpt-4o',
messages: [{ role: 'user', content: 'Say this is a test' }],
});
console.log(completion.choices[0].message);
}
main();
import openai
openai.api_base = "https://openrouter.ai/api/v1"
openai.api_key = "<OPENROUTER_API_KEY>"
response = openai.ChatCompletion.create(
model="openai/gpt-4o",
messages=[...],
headers={
"HTTP-Referer": "<YOUR_SITE_URL>", # Optional. Site URL for rankings on openrouter.ai.
"X-Title": "<YOUR_SITE_NAME>", # Optional. Site title for rankings on openrouter.ai.
},
)
reply = response.choices[0].message
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
}'
To stream with Python, see this example from OpenAI.
OpenRouter is a GitHub secret scanning partner, and has other methods to detect exposed keys. If we determine that your key has been compromised, you will receive an email notification.
If you receive such a notification or suspect your key has been exposed, immediately visit your key settings page to delete the compromised key and create a new one.
Using environment variables and keeping keys out of your codebase is strongly recommended.
Learn about all available parameters for OpenRouter API requests. Configure temperature, max tokens, top_p, and other model-specific settings.
Sampling parameters shape the token generation process of the model. You may send any parameters from the following list, as well as others, to OpenRouter.
OpenRouter will default to the values listed below if certain parameters are absent from your request (for example, temperature
to 1.0). We will also transmit some provider-specific parameters, such as safe_prompt
for Mistral or raw_mode
for Hyperbolic directly to the respective providers if specified.
Please refer to the model’s provider section to confirm which parameters are supported. For detailed guidance on managing provider-specific parameters, click here.
Key: temperature
Optional, float, 0.0 to 2.0
Default: 1.0
Explainer Video: Watch
This setting influences the variety in the model’s responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input.
Key: top_p
Optional, float, 0.0 to 1.0
Default: 1.0
Explainer Video: Watch
This setting limits the model’s choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model’s responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K.
Key: top_k
Optional, integer, 0 or above
Default: 0
Explainer Video: Watch
This limits the model’s choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.
Key: frequency_penalty
Optional, float, -2.0 to 2.0
Default: 0.0
Explainer Video: Watch
This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse.
Key: presence_penalty
Optional, float, -2.0 to 2.0
Default: 0.0
Explainer Video: Watch
Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse.
Key: repetition_penalty
Optional, float, 0.0 to 2.0
Default: 1.0
Explainer Video: Watch
Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token’s probability.
Key: min_p
Optional, float, 0.0 to 1.0
Default: 0.0
Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option.
Key: top_a
Optional, float, 0.0 to 1.0
Default: 0.0
Consider only the top tokens with “sufficiently high” probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability.
Key: seed
Optional, integer
If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed for some models.
Key: max_tokens
Optional, integer, 1 or above
This sets the upper limit for the number of tokens the model can generate in response. It won’t produce more than this limit. The maximum value is the context length minus the prompt length.
Key: logit_bias
Optional, map
Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
Key: logprobs
Optional, boolean
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
Key: top_logprobs
Optional, integer
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
Key: response_format
Optional, map
Forces the model to produce specific output format. Setting to { "type": "json_object" }
enables JSON mode, which guarantees the message the model generates is valid JSON.
Note: when using JSON mode, you should also instruct the model to produce JSON yourself via a system or user message.
Key: structured_outputs
Optional, boolean
If the model can return structured outputs using response_format json_schema.
Key: stop
Optional, array
Stop generation immediately if the model encounter any token specified in the stop array.
Key: tools
Optional, array
Tool calling parameter, following OpenAI’s tool calling request shape. For non-OpenAI providers, it will be transformed accordingly. Click here to learn more about tool calling
Key: tool_choice
Optional, array
Controls which (if any) tool is called by the model. ‘none’ means the model will not call any tool and instead generates a message. ‘auto’ means the model can pick between generating a message or calling one or more tools. ‘required’ means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}}
forces the model to call that tool.
Key: max_price
Optional, map
A JSON object specifying the highest provider pricing you will accept. For example, the value {"prompt": 1, "completion": 2}
will route to any provider with a price of <= $1/m
prompt tokens, and <= $2/m
completion tokens or less. Some providers support per request pricing, in which case you can use the “request” attribute of max_price. Lastly, “image” is also available, which specifies the max price per image you will accept. Practically, this field is often combined with a provider “sort” to e.g. state “Use the provider with the highest throughput, as long as it doesn’t cost more than $x/m
tokens.”
Learn about OpenRouter’s API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
To check the rate limit or credits left on an API key, make a GET request to https://openrouter.ai/api/v1/auth/key
.
<TemplatedCodeBlocks templates={{ API_KEY_REF }}>
const response = await fetch('https://openrouter.ai/api/v1/auth/key', {
method: 'GET',
headers: {
Authorization: 'Bearer {{API_KEY_REF}}',
},
});
If you submit a valid API key, you should get a response of the form:
<TemplatedCodeBlocks templates={{ API_KEY_REF }}>
type Key = {
data: {
label: string;
usage: number; // Number of credits used
limit: number | null; // Credit limit for the key, or null if unlimited
is_free_tier: boolean; // Whether the user has paid for credits before
rate_limit: {
requests: number; // Number of requests allowed...
interval: string; // in this interval, e.g. "10s"
};
};
};
There are a few rate limits that apply to certain types of requests, regardless of account status:
Free limit: If you are using a free model variant (with an ID ending in {sep}{Variant.Free}
), then you will be limited to {freeModelRateLimitRPM} requests per minute and {freeModelRateLimitRPD} requests per day.
DDoS protection: Cloudflare’s DDoS protection will block requests that dramatically exceed reasonable usage.
For all other requests, rate limits are a function of the number of credits remaining on the key or account. Partial credits round up in your favor. For the credits available on your API key, you can make 1 request per credit per second up to the surge limit (typically 500 requests per second, but you can go higher).
For example:
If your account has a negative credit balance, you may see {HTTPStatus.S402_Payment_Required}
errors, including for free models. Adding credits to put your balance above zero allows you to use those models again.
Learn how to handle errors in OpenRouter API interactions. Comprehensive guide to error codes, messages, and best practices for error handling.
For errors, OpenRouter returns a JSON response with the following shape:
type ErrorResponse = {
error: {
code: number;
message: string;
metadata?: Record<string, unknown>;
};
};
The HTTP Response will have the same status code as error.code
, forming a request error if:
Otherwise, the returned HTTP response status will be {HTTPStatus.S200_OK}
and any error occurred while the LLM is producing the output will be emitted in the response body or as an SSE data event.
Example code for printing errors in JavaScript:
const request = await fetch('https://openrouter.ai/...');
console.log(request.status); // Will be an error code unless the model started processing your request
const response = await request.json();
console.error(response.error?.status); // Will be an error code
console.error(response.error?.message);
If your input was flagged, the error.metadata
will contain information about the issue. The shape of the metadata is as follows:
type ModerationErrorMetadata = {
reasons: string[]; // Why your input was flagged
flagged_input: string; // The text segment that was flagged, limited to 100 characters. If the flagged input is longer than 100 characters, it will be truncated in the middle and replaced with ...
provider_name: string; // The name of the provider that requested moderation
model_slug: string;
};
If the model provider encounters an error, the error.metadata
will contain information about the issue. The shape of the metadata is as follows:
type ProviderErrorMetadata = {
provider_name: string; // The name of the provider that encountered the error
raw: unknown; // The raw error from the provider
};
Occasionally, the model may not generate any content. This typically occurs when:
Warm-up times usually range from a few seconds to a few minutes, depending on the model and provider.
If you encounter persistent no-content issues, consider implementing a simple retry mechanism or trying again with a different provider or model that has more recent activity.
Additionally, be aware that in some cases, you may still be charged for the prompt processing cost by the upstream provider, even if no content is generated.
POST https://openrouter.ai/api/v1/completions
Content-Type: application/json
Send a completion request to a selected model (text-only format)
curl -X POST https://openrouter.ai/api/v1/completions \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "model",
"prompt": "prompt"
}'
POST https://openrouter.ai/api/v1/chat/completions
Content-Type: application/json
Send a chat completion request to a selected model
curl -X POST https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
]
}'
GET https://openrouter.ai/api/v1/generation
Returns metadata about a specific generation request
curl -G https://openrouter.ai/api/v1/generation \
-H "Authorization: Bearer <token>" \
-d id=id
GET https://openrouter.ai/api/v1/models
Returns a list of models available through the API
curl https://openrouter.ai/api/v1/models
GET https://openrouter.ai/api/v1/models/{author}/{slug}/endpoints
curl https://openrouter.ai/api/v1/models/author/slug/endpoints
GET https://openrouter.ai/api/v1/credits
Returns the total credits purchased and used for the authenticated user
curl https://openrouter.ai/api/v1/credits \
-H "Authorization: Bearer <token>"
POST https://openrouter.ai/api/v1/credits/coinbase
Content-Type: application/json
Creates and hydrates a Coinbase Commerce charge for cryptocurrency payments
curl -X POST https://openrouter.ai/api/v1/credits/coinbase \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"amount": 1.1,
"sender": "sender",
"chain_id": 1
}'
POST https://openrouter.ai/api/v1/auth/keys
Content-Type: application/json
Exchange an authorization code from the PKCE flow for a user-controlled API key
curl -X POST https://openrouter.ai/api/v1/auth/keys \
-H "Content-Type: application/json" \
-d '{
"code": "code"
}'
curl -X POST https://openrouter.ai/api/v1/auth/keys \
-H "Content-Type: application/json" \
-d '{
"code": "string"
}'
curl -X POST https://openrouter.ai/api/v1/auth/keys \
-H "Content-Type: application/json" \
-d '{
"code": "string"
}'
curl -X POST https://openrouter.ai/api/v1/auth/keys \
-H "Content-Type: application/json" \
-d '{
"code": "string"
}'
GET https://openrouter.ai/api/v1/key
Get information on the API key associated with the current authentication session
curl https://openrouter.ai/api/v1/key \
-H "Authorization: Bearer <token>"
curl https://openrouter.ai/api/v1/key \
-H "Authorization: Bearer <token>"
curl https://openrouter.ai/api/v1/key \
-H "Authorization: Bearer <token>"
curl https://openrouter.ai/api/v1/key \
-H "Authorization: Bearer <token>"
GET https://openrouter.ai/api/v1/keys
Returns a list of all API keys associated with the account. Requires a Provisioning API key.
curl https://openrouter.ai/api/v1/keys \
-H "Authorization: Bearer <token>"
POST https://openrouter.ai/api/v1/keys
Content-Type: application/json
Creates a new API key. Requires a Provisioning API key.
curl -X POST https://openrouter.ai/api/v1/keys \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"name": "name"
}'
GET https://openrouter.ai/api/v1/keys/{hash}
Returns details about a specific API key. Requires a Provisioning API key.
curl https://openrouter.ai/api/v1/keys/hash \
-H "Authorization: Bearer <token>"
DELETE https://openrouter.ai/api/v1/keys/{hash}
Deletes an API key. Requires a Provisioning API key.
curl -X DELETE https://openrouter.ai/api/v1/keys/hash \
-H "Authorization: Bearer <token>"
PATCH https://openrouter.ai/api/v1/keys/{hash}
Content-Type: application/json
Updates an existing API key. Requires a Provisioning API key.
curl -X PATCH https://openrouter.ai/api/v1/keys/hash \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{}'
Learn how to use your existing AI provider keys with OpenRouter. Integrate your own API keys while leveraging OpenRouter’s unified interface and features.
OpenRouter supports both OpenRouter credits and the option to bring your own provider keys (BYOK).
When you use OpenRouter credits, your rate limits for each provider are managed by OpenRouter.
Using provider keys enables direct control over rate limits and costs via your provider account.
Your provider keys are securely encrypted and used for all requests routed through the specified provider.
Manage keys in your account settings.
The cost of using custom provider keys on OpenRouter is 5% of what the same model/provider would cost normally on OpenRouter and will be deducted from your OpenRouter credits.
You can configure individual keys to act as fallbacks.
When “Use this key as a fallback” is enabled for a key, OpenRouter will prioritize using your credits. If it hits a rate limit or encounters a failure, it will then retry with your key.
Conversely, if “Use this key as a fallback” is disabled for a key, OpenRouter will prioritize using your key. If it hits a rate limit or encounters a failure, it will then retry with your credits.
To use Azure AI Services with OpenRouter, you’ll need to provide your Azure API key configuration in JSON format. Each key configuration requires the following fields:
{
"model_slug": "the-openrouter-model-slug",
"endpoint_url": "https://<resource>.services.ai.azure.com/deployments/<model-id>/chat/completions?api-version=<api-version>",
"api_key": "your-azure-api-key",
"model_id": "the-azure-model-id"
}
You can find these values in your Azure AI Services resource:
endpoint_url: Navigate to your Azure AI Services resource in the Azure portal. In the “Overview” section, you’ll find your endpoint URL. Make sure to append /chat/completions
to the base URL. You can read more in the Azure Foundry documentation.
api_key: In the same “Overview” section of your Azure AI Services resource, you can find your API key under “Keys and Endpoint”.
model_id: This is the name of your model deployment in Azure AI Services.
model_slug: This is the OpenRouter model identifier you want to use this key for.
Since Azure supports multiple model deployments, you can provide an array of configurations for different models:
[
{
"model_slug": "mistralai/mistral-large",
"endpoint_url": "https://example-project.openai.azure.com/openai/deployments/mistral-large/chat/completions?api-version=2024-08-01-preview",
"api_key": "your-azure-api-key",
"model_id": "mistral-large"
},
{
"model_slug": "openai/gpt-4o",
"endpoint_url": "https://example-project.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-08-01-preview",
"api_key": "your-azure-api-key",
"model_id": "gpt-4o"
}
]
Make sure to replace the url with your own project url. Also the url should end with /chat/completions with the api version that you would like to use.
To use Amazon Bedrock with OpenRouter, you’ll need to provide your AWS credentials in JSON format. The configuration requires the following fields:
{
"accessKeyId": "your-aws-access-key-id",
"secretAccessKey": "your-aws-secret-access-key",
"region": "your-aws-region"
}
You can find these values in your AWS account:
accessKeyId: This is your AWS Access Key ID. You can create or find your access keys in the AWS Management Console under “Security Credentials” in your AWS account.
secretAccessKey: This is your AWS Secret Access Key, which is provided when you create an access key.
region: The AWS region where your Amazon Bedrock models are deployed (e.g., “us-east-1”, “us-west-2”).
Make sure your AWS IAM user or role has the necessary permissions to access Amazon Bedrock services. At minimum, you’ll need permissions for:
bedrock:InvokeModel
bedrock:InvokeModelWithResponseStream
(for streaming responses)Example IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
}
]
}
For enhanced security, we recommend creating dedicated IAM users with limited permissions specifically for use with OpenRouter.
Learn more in the AWS Bedrock Getting Started with the API documentation, IAM Permissions Setup guide, or the AWS Bedrock API Reference.
Learn how to purchase OpenRouter credits using cryptocurrency. Complete guide to Coinbase integration, supported chains, and automated credit purchases.
You can purchase credits using cryptocurrency through our Coinbase integration. This can either happen through the UI, on your credits page, or through our API as described below. While other forms of payment are possible, this guide specifically shows how to pay with the chain’s native token.
Headless credit purchases involve three steps:
Make a POST request to /api/v1/credits/coinbase
to create a new charge. You’ll include the amount of credits you want to purchase (in USD, up to ${cryptoMaxDollarPurchase}), the address you’ll be sending the transaction from, and the EVM chain ID of the network you’ll be sending on.
Currently, we only support the following chains (mainnet only):
const response = await fetch('https://openrouter.ai/api/v1/credits/coinbase', {
method: 'POST',
headers: {
Authorization: 'Bearer <OPENROUTER_API_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
amount: 10, // Target credit amount in USD
sender: '0x9a85CB3bfd494Ea3a8C9E50aA6a3c1a7E8BACE11',
chain_id: 8453,
}),
});
const responseJSON = await response.json();
The response includes the charge details and transaction data needed to execute the on-chain payment:
{
"data": {
"id": "...",
"created_at": "2024-01-01T00:00:00Z",
"expires_at": "2024-01-01T01:00:00Z",
"web3_data": {
"transfer_intent": {
"metadata": {
"chain_id": 8453,
"contract_address": "0x03059433bcdb6144624cc2443159d9445c32b7a8",
"sender": "0x9a85CB3bfd494Ea3a8C9E50aA6a3c1a7E8BACE11"
},
"call_data": {
"recipient_amount": "...",
"deadline": "...",
"recipient": "...",
"recipient_currency": "...",
"refund_destination": "...",
"fee_amount": "...",
"id": "...",
"operator": "...",
"signature": "...",
"prefix": "..."
}
}
}
}
}
You can use viem (or another similar evm client) to execute the transaction on-chain.
In this example, we’ll be fulfilling the charge using the swapAndTransferUniswapV3Native() function. Other methods of swapping are also available, and you can learn more by checking out Coinbase’s onchain payment protocol here. Note, if you are trying to pay in a less common ERC-20, there is added complexity in needing to make sure that there is sufficient liquidity in the pool to swap the tokens.
import { createPublicClient, createWalletClient, http, parseEther } from 'viem';
import { privateKeyToAccount } from 'viem/accounts';
import { base } from 'viem/chains';
// The ABI for Coinbase's onchain payment protocol
const abi = [
{
inputs: [
{
internalType: 'contract IUniversalRouter',
name: '_uniswap',
type: 'address',
},
{ internalType: 'contract Permit2', name: '_permit2', type: 'address' },
{ internalType: 'address', name: '_initialOperator', type: 'address' },
{
internalType: 'address',
name: '_initialFeeDestination',
type: 'address',
},
{
internalType: 'contract IWrappedNativeCurrency',
name: '_wrappedNativeCurrency',
type: 'address',
},
],
stateMutability: 'nonpayable',
type: 'constructor',
},
{ inputs: [], name: 'AlreadyProcessed', type: 'error' },
{ inputs: [], name: 'ExpiredIntent', type: 'error' },
{
inputs: [
{ internalType: 'address', name: 'attemptedCurrency', type: 'address' },
],
name: 'IncorrectCurrency',
type: 'error',
},
{ inputs: [], name: 'InexactTransfer', type: 'error' },
{
inputs: [{ internalType: 'uint256', name: 'difference', type: 'uint256' }],
name: 'InsufficientAllowance',
type: 'error',
},
{
inputs: [{ internalType: 'uint256', name: 'difference', type: 'uint256' }],
name: 'InsufficientBalance',
type: 'error',
},
{
inputs: [{ internalType: 'int256', name: 'difference', type: 'int256' }],
name: 'InvalidNativeAmount',
type: 'error',
},
{ inputs: [], name: 'InvalidSignature', type: 'error' },
{ inputs: [], name: 'InvalidTransferDetails', type: 'error' },
{
inputs: [
{ internalType: 'address', name: 'recipient', type: 'address' },
{ internalType: 'uint256', name: 'amount', type: 'uint256' },
{ internalType: 'bool', name: 'isRefund', type: 'bool' },
{ internalType: 'bytes', name: 'data', type: 'bytes' },
],
name: 'NativeTransferFailed',
type: 'error',
},
{ inputs: [], name: 'NullRecipient', type: 'error' },
{ inputs: [], name: 'OperatorNotRegistered', type: 'error' },
{ inputs: [], name: 'PermitCallFailed', type: 'error' },
{
inputs: [{ internalType: 'bytes', name: 'reason', type: 'bytes' }],
name: 'SwapFailedBytes',
type: 'error',
},
{
inputs: [{ internalType: 'string', name: 'reason', type: 'string' }],
name: 'SwapFailedString',
type: 'error',
},
{
anonymous: false,
inputs: [
{
indexed: false,
internalType: 'address',
name: 'operator',
type: 'address',
},
{
indexed: false,
internalType: 'address',
name: 'feeDestination',
type: 'address',
},
],
name: 'OperatorRegistered',
type: 'event',
},
{
anonymous: false,
inputs: [
{
indexed: false,
internalType: 'address',
name: 'operator',
type: 'address',
},
],
name: 'OperatorUnregistered',
type: 'event',
},
{
anonymous: false,
inputs: [
{
indexed: true,
internalType: 'address',
name: 'previousOwner',
type: 'address',
},
{
indexed: true,
internalType: 'address',
name: 'newOwner',
type: 'address',
},
],
name: 'OwnershipTransferred',
type: 'event',
},
{
anonymous: false,
inputs: [
{
indexed: false,
internalType: 'address',
name: 'account',
type: 'address',
},
],
name: 'Paused',
type: 'event',
},
{
anonymous: false,
inputs: [
{
indexed: true,
internalType: 'address',
name: 'operator',
type: 'address',
},
{ indexed: false, internalType: 'bytes16', name: 'id', type: 'bytes16' },
{
indexed: false,
internalType: 'address',
name: 'recipient',
type: 'address',
},
{
indexed: false,
internalType: 'address',
name: 'sender',
type: 'address',
},
{
indexed: false,
internalType: 'uint256',
name: 'spentAmount',
type: 'uint256',
},
{
indexed: false,
internalType: 'address',
name: 'spentCurrency',
type: 'address',
},
],
name: 'Transferred',
type: 'event',
},
{
anonymous: false,
inputs: [
{
indexed: false,
internalType: 'address',
name: 'account',
type: 'address',
},
],
name: 'Unpaused',
type: 'event',
},
{
inputs: [],
name: 'owner',
outputs: [{ internalType: 'address', name: '', type: 'address' }],
stateMutability: 'view',
type: 'function',
},
{
inputs: [],
name: 'pause',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [],
name: 'paused',
outputs: [{ internalType: 'bool', name: '', type: 'bool' }],
stateMutability: 'view',
type: 'function',
},
{
inputs: [],
name: 'permit2',
outputs: [{ internalType: 'contract Permit2', name: '', type: 'address' }],
stateMutability: 'view',
type: 'function',
},
{
inputs: [],
name: 'registerOperator',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{ internalType: 'address', name: '_feeDestination', type: 'address' },
],
name: 'registerOperatorWithFeeDestination',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [],
name: 'renounceOwnership',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [{ internalType: 'address', name: 'newSweeper', type: 'address' }],
name: 'setSweeper',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
{
components: [
{ internalType: 'address', name: 'owner', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
],
internalType: 'struct EIP2612SignatureTransferData',
name: '_signatureTransferData',
type: 'tuple',
},
],
name: 'subsidizedTransferToken',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
{ internalType: 'uint24', name: 'poolFeesTier', type: 'uint24' },
],
name: 'swapAndTransferUniswapV3Native',
outputs: [],
stateMutability: 'payable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
{
components: [
{
components: [
{
components: [
{ internalType: 'address', name: 'token', type: 'address' },
{ internalType: 'uint256', name: 'amount', type: 'uint256' },
],
internalType: 'struct ISignatureTransfer.TokenPermissions',
name: 'permitted',
type: 'tuple',
},
{ internalType: 'uint256', name: 'nonce', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
],
internalType: 'struct ISignatureTransfer.PermitTransferFrom',
name: 'permit',
type: 'tuple',
},
{
components: [
{ internalType: 'address', name: 'to', type: 'address' },
{
internalType: 'uint256',
name: 'requestedAmount',
type: 'uint256',
},
],
internalType: 'struct ISignatureTransfer.SignatureTransferDetails',
name: 'transferDetails',
type: 'tuple',
},
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
],
internalType: 'struct Permit2SignatureTransferData',
name: '_signatureTransferData',
type: 'tuple',
},
{ internalType: 'uint24', name: 'poolFeesTier', type: 'uint24' },
],
name: 'swapAndTransferUniswapV3Token',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
{ internalType: 'address', name: '_tokenIn', type: 'address' },
{ internalType: 'uint256', name: 'maxWillingToPay', type: 'uint256' },
{ internalType: 'uint24', name: 'poolFeesTier', type: 'uint24' },
],
name: 'swapAndTransferUniswapV3TokenPreApproved',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{ internalType: 'address payable', name: 'destination', type: 'address' },
],
name: 'sweepETH',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{ internalType: 'address payable', name: 'destination', type: 'address' },
{ internalType: 'uint256', name: 'amount', type: 'uint256' },
],
name: 'sweepETHAmount',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{ internalType: 'address', name: '_token', type: 'address' },
{ internalType: 'address', name: 'destination', type: 'address' },
],
name: 'sweepToken',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{ internalType: 'address', name: '_token', type: 'address' },
{ internalType: 'address', name: 'destination', type: 'address' },
{ internalType: 'uint256', name: 'amount', type: 'uint256' },
],
name: 'sweepTokenAmount',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [],
name: 'sweeper',
outputs: [{ internalType: 'address', name: '', type: 'address' }],
stateMutability: 'view',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
],
name: 'transferNative',
outputs: [],
stateMutability: 'payable',
type: 'function',
},
{
inputs: [{ internalType: 'address', name: 'newOwner', type: 'address' }],
name: 'transferOwnership',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
{
components: [
{
components: [
{
components: [
{ internalType: 'address', name: 'token', type: 'address' },
{ internalType: 'uint256', name: 'amount', type: 'uint256' },
],
internalType: 'struct ISignatureTransfer.TokenPermissions',
name: 'permitted',
type: 'tuple',
},
{ internalType: 'uint256', name: 'nonce', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
],
internalType: 'struct ISignatureTransfer.PermitTransferFrom',
name: 'permit',
type: 'tuple',
},
{
components: [
{ internalType: 'address', name: 'to', type: 'address' },
{
internalType: 'uint256',
name: 'requestedAmount',
type: 'uint256',
},
],
internalType: 'struct ISignatureTransfer.SignatureTransferDetails',
name: 'transferDetails',
type: 'tuple',
},
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
],
internalType: 'struct Permit2SignatureTransferData',
name: '_signatureTransferData',
type: 'tuple',
},
],
name: 'transferToken',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
],
name: 'transferTokenPreApproved',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [],
name: 'unpause',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [],
name: 'unregisterOperator',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
{
components: [
{
components: [
{
components: [
{ internalType: 'address', name: 'token', type: 'address' },
{ internalType: 'uint256', name: 'amount', type: 'uint256' },
],
internalType: 'struct ISignatureTransfer.TokenPermissions',
name: 'permitted',
type: 'tuple',
},
{ internalType: 'uint256', name: 'nonce', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
],
internalType: 'struct ISignatureTransfer.PermitTransferFrom',
name: 'permit',
type: 'tuple',
},
{
components: [
{ internalType: 'address', name: 'to', type: 'address' },
{
internalType: 'uint256',
name: 'requestedAmount',
type: 'uint256',
},
],
internalType: 'struct ISignatureTransfer.SignatureTransferDetails',
name: 'transferDetails',
type: 'tuple',
},
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
],
internalType: 'struct Permit2SignatureTransferData',
name: '_signatureTransferData',
type: 'tuple',
},
],
name: 'unwrapAndTransfer',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
],
name: 'unwrapAndTransferPreApproved',
outputs: [],
stateMutability: 'nonpayable',
type: 'function',
},
{
inputs: [
{
components: [
{ internalType: 'uint256', name: 'recipientAmount', type: 'uint256' },
{ internalType: 'uint256', name: 'deadline', type: 'uint256' },
{
internalType: 'address payable',
name: 'recipient',
type: 'address',
},
{
internalType: 'address',
name: 'recipientCurrency',
type: 'address',
},
{
internalType: 'address',
name: 'refundDestination',
type: 'address',
},
{ internalType: 'uint256', name: 'feeAmount', type: 'uint256' },
{ internalType: 'bytes16', name: 'id', type: 'bytes16' },
{ internalType: 'address', name: 'operator', type: 'address' },
{ internalType: 'bytes', name: 'signature', type: 'bytes' },
{ internalType: 'bytes', name: 'prefix', type: 'bytes' },
],
internalType: 'struct TransferIntent',
name: '_intent',
type: 'tuple',
},
],
name: 'wrapAndTransfer',
outputs: [],
stateMutability: 'payable',
type: 'function',
},
{ stateMutability: 'payable', type: 'receive' },
];
// Set up viem clients
const publicClient = createPublicClient({
chain: base,
transport: http(),
});
const account = privateKeyToAccount('0x...');
const walletClient = createWalletClient({
chain: base,
transport: http(),
account,
});
// Use the calldata included in the charge response
const { contract_address } =
responseJSON.data.web3_data.transfer_intent.metadata;
const call_data = responseJSON.data.web3_data.transfer_intent.call_data;
// When transacting in ETH, a pool fees tier of 500 (the lowest) is very
// likely to be sufficient. However, if you plan to swap with a different
// contract method, using less-common ERC-20 tokens, it is recommended to
// call that chain's Uniswap QuoterV2 contract to check its liquidity.
// Depending on the results, choose the lowest fee tier which has enough
// liquidity in the pool.
const poolFeesTier = 500;
// Simulate the transaction first to prevent most common revert reasons
const { request } = await publicClient.simulateContract({
abi,
account,
address: contract_address,
functionName: 'swapAndTransferUniswapV3Native',
args: [
{
recipientAmount: BigInt(call_data.recipient_amount),
deadline: BigInt(
Math.floor(new Date(call_data.deadline).getTime() / 1000),
),
recipient: call_data.recipient,
recipientCurrency: call_data.recipient_currency,
refundDestination: call_data.refund_destination,
feeAmount: BigInt(call_data.fee_amount),
id: call_data.id,
operator: call_data.operator,
signature: call_data.signature,
prefix: call_data.prefix,
},
poolFeesTier,
],
// Transaction value in ETH. You'll want to include a little extra to
// ensure the transaction & swap is successful. All excess funds return
// back to your sender address afterwards.
value: parseEther('0.004'),
});
// Send the transaction on chain
const txHash = await walletClient.writeContract(request);
console.log('Transaction hash:', txHash);
Once the transaction succeeds on chain, we’ll add credits to your account. You can track the transaction status using the returned transaction hash.
Credit purchases lower than $500 will be immediately credited once the transaction is on chain. Above $500, there is a ~15 minute confirmation delay, ensuring the chain does not re-org your purchase.
While it is possible to simply run down the balance until your app starts receiving 402 error codes for insufficient credits, this gap in service while topping up might not be desirable.
To avoid this, you can periodically call the GET /api/v1/credits
endpoint to check your available credits.
const response = await fetch('https://openrouter.ai/api/v1/credits', {
method: 'GET',
headers: { Authorization: 'Bearer <OPENROUTER_API_KEY>' },
});
const { data } = await response.json();
The response includes your total credits purchased and usage, where your current balance is the difference between the two:
{
"data": {
"total_credits": 50.0,
"total_usage": 42.0
}
}
Note that these values are cached, and may be up to 60 seconds stale.
Implement secure user authentication with OpenRouter using OAuth PKCE. Complete guide to setting up and managing OAuth authentication flows.
Users can connect to OpenRouter in one click using Proof Key for Code Exchange (PKCE).
Here’s a step-by-step guide:
To start the PKCE flow, send your user to OpenRouter’s /auth
URL with a callback_url
parameter pointing back to your site:
https://openrouter.ai/auth?callback_url=<YOUR_SITE_URL>&code_challenge=<CODE_CHALLENGE>&code_challenge_method=plain
https://openrouter.ai/auth?callback_url=<YOUR_SITE_URL>
The code_challenge
parameter is optional but recommended.
Your user will be prompted to log in to OpenRouter and authorize your app. After authorization, they will be redirected back to your site with a code
parameter in the URL:

For more info, visit Auth0’s docs.
The following example leverages the Web Crypto API and the Buffer API to generate a code challenge for the S256 method. You will need a bundler to use the Buffer API in the web browser:
async function createSHA256CodeChallenge(input: string) { const encoder = new TextEncoder(); const data = encoder.encode(input); const hash = await crypto.subtle.digest(‘SHA-256’, data); return Buffer.from(hash).toString(‘base64url’); }
const codeVerifier = ‘your-random-string’; const generatedCodeChallenge = await createSHA256CodeChallenge(codeVerifier);
</CodeBlocks>
#### Localhost Apps
If your app is a local-first app or otherwise doesn't have a public URL, it is recommended to test with `http://localhost:3000` as the callback and referrer URLs.
When moving to production, replace the localhost/private referrer URL with a public GitHub repo or a link to your project website.
### Step 2: Exchange the code for a user-controlled API key
After the user logs in with OpenRouter, they are redirected back to your site with a `code` parameter in the URL:

Extract this code using the browser API:
<CodeBlocks>
```typescript title="Extract Code"
const urlParams = new URLSearchParams(window.location.search);
const code = urlParams.get('code');
Then use it to make an API call to https://openrouter.ai/api/v1/auth/keys
to exchange the code for a user-controlled API key:
const { key } = await response.json();
</CodeBlocks>
And that's it for the PKCE flow!
### Step 3: Use the API key
Store the API key securely within the user's browser or in your own database, and use it to [make OpenRouter requests](/api-reference/completion).
<CodeBlocks>
```typescript title="Make an OpenRouter request"
fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer <API_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'openai/gpt-4o',
messages: [
{
role: 'user',
content: 'Hello!',
},
],
}),
});
400 Invalid code_challenge_method
: Make sure you’re using the same code challenge method in step 1 as in step 2.403 Invalid code or code_verifier
: Make sure your user is logged in to OpenRouter, and that code_verifier
and code_challenge_method
are correct.405 Method Not Allowed
: Make sure you’re using POST
and HTTPS
for your request.Learn how to use MCP Servers with OpenRouter
MCP servers are a popular way of providing LLMs with tool calling abilities, and are an alternative to using OpenAI-compatible tool calling.
By converting MCP (Anthropic) tool definitions to OpenAI-compatible tool definitions, you can use MCP servers with OpenRouter.
In this example, we’ll use Anthropic’s MCP client SDK to interact with the File System MCP, all with OpenRouter under the hood.
First, some setup. In order to run this you will need to pip install the packages, and create a .env
file with OPENAI_API_KEY set. This example also assumes the directory /Applications
exists.
import asyncio
from typing import Optional
from contextlib import AsyncExitStack
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from openai import OpenAI
from dotenv import load_dotenv
import json
load_dotenv() # load environment variables from .env
MODEL = "anthropic/claude-3-7-sonnet"
SERVER_CONFIG = {
"command": "npx",
"args": ["-y",
"@modelcontextprotocol/server-filesystem",
f"/Applications/"],
"env": None
}
Next, our helper function to convert MCP tool definitions to OpenAI tool definitions:
def convert_tool_format(tool):
converted_tool = {
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": {
"type": "object",
"properties": tool.inputSchema["properties"],
"required": tool.inputSchema["required"]
}
}
}
return converted_tool
And, the MCP client itself; a regrettable ~100 lines of code. Note that the SERVER_CONFIG is hard-coded into the client, but of course could be parameterized for other MCP servers.
class MCPClient:
def __init__(self):
self.session: Optional[ClientSession] = None
self.exit_stack = AsyncExitStack()
self.openai = OpenAI(
base_url="https://openrouter.ai/api/v1"
)
async def connect_to_server(self, server_config):
server_params = StdioServerParameters(**server_config)
stdio_transport = await self.exit_stack.enter_async_context(stdio_client(server_params))
self.stdio, self.write = stdio_transport
self.session = await self.exit_stack.enter_async_context(ClientSession(self.stdio, self.write))
await self.session.initialize()
# List available tools from the MCP server
response = await self.session.list_tools()
print("\nConnected to server with tools:", [tool.name for tool in response.tools])
self.messages = []
async def process_query(self, query: str) -> str:
self.messages.append({
"role": "user",
"content": query
})
response = await self.session.list_tools()
available_tools = [convert_tool_format(tool) for tool in response.tools]
response = self.openai.chat.completions.create(
model=MODEL,
tools=available_tools,
messages=self.messages
)
self.messages.append(response.choices[0].message.model_dump())
final_text = []
content = response.choices[0].message
if content.tool_calls is not None:
tool_name = content.tool_calls[0].function.name
tool_args = content.tool_calls[0].function.arguments
tool_args = json.loads(tool_args) if tool_args else {}
# Execute tool call
try:
result = await self.session.call_tool(tool_name, tool_args)
final_text.append(f"[Calling tool {tool_name} with args {tool_args}]")
except Exception as e:
print(f"Error calling tool {tool_name}: {e}")
result = None
self.messages.append({
"role": "tool",
"tool_call_id": content.tool_calls[0].id,
"name": tool_name,
"content": result.content
})
response = self.openai.chat.completions.create(
model=MODEL,
max_tokens=1000,
messages=self.messages,
)
final_text.append(response.choices[0].message.content)
else:
final_text.append(content.content)
return "\n".join(final_text)
async def chat_loop(self):
"""Run an interactive chat loop"""
print("\nMCP Client Started!")
print("Type your queries or 'quit' to exit.")
while True:
try:
query = input("\nQuery: ").strip()
result = await self.process_query(query)
print("Result:")
print(result)
except Exception as e:
print(f"Error: {str(e)}")
async def cleanup(self):
await self.exit_stack.aclose()
async def main():
client = MCPClient()
try:
await client.connect_to_server(SERVER_CONFIG)
await client.chat_loop()
finally:
await client.cleanup()
if __name__ == "__main__":
import sys
asyncio.run(main())
Assembling all of the above code into mcp-client.py, you get a client that behaves as follows (some outputs truncated for brevity):
% python mcp-client.py
Secure MCP Filesystem Server running on stdio
Allowed directories: [ '/Applications' ]
Connected to server with tools: ['read_file', 'read_multiple_files', 'write_file'...]
MCP Client Started!
Type your queries or 'quit' to exit.
Query: Do I have microsoft office installed?
Result:
[Calling tool list_allowed_directories with args {}]
I can check if Microsoft Office is installed in the Applications folder:
Query: continue
Result:
[Calling tool search_files with args {'path': '/Applications', 'pattern': 'Microsoft'}]
Now let me check specifically for Microsoft Office applications:
Query: continue
Result:
I can see from the search results that Microsoft Office is indeed installed on your system.
The search found the following main Microsoft Office applications:
1. Microsoft Excel - /Applications/Microsoft Excel.app
2. Microsoft PowerPoint - /Applications/Microsoft PowerPoint.app
3. Microsoft Word - /Applications/Microsoft Word.app
4. OneDrive - /Applications/OneDrive.app (which includes Microsoft SharePoint integration)
Learn how to integrate your AI models with OpenRouter. Complete guide for providers to make their models available through OpenRouter’s unified API.
If you’d like to be a model provider and sell inference on OpenRouter, fill out our form to get started.
To be eligible to provide inference on OpenRouter you must have the following:
You must implement an endpoint that returns all models that should be served by OpenRouter. At this endpoint, please return a list of all available models on your platform. Below is an example of the response format:
{
"data": [
{
"id": "anthropic/claude-2.0",
"name": "Anthropic: Claude v2.0",
"created": 1690502400,
"description": "Anthropic's flagship model...", // Optional
"context_length": 100000, // Required
"max_completion_tokens": 4096, // Optional
"pricing": {
"prompt": "0.000008", // pricing per 1 token
"completion": "0.000024", // pricing per 1 token
"image": "0", // pricing per 1 image
"request": "0" // pricing per 1 request
}
}
]
}
NOTE: pricing
fields are in string format to avoid floating point precision issues, and must be in USD.
For OpenRouter to use the provider we must be able to pay for inference automatically. This can be done via auto top up or invoicing.
Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.
For models that support it, the OpenRouter API can return Reasoning Tokens, also known as thinking tokens. OpenRouter normalizes the different ways of customizing the amount of reasoning tokens that the model will use, providing a unified interface across different providers.
Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.
Reasoning tokens are included in the response by default if the model decides to output them. Reasoning tokens will appear in the reasoning
field of each message, unless you decide to exclude them.
You can control reasoning tokens in your requests using the reasoning
parameter:
{
"model": "your-model",
"messages": [],
"reasoning": {
// One of the following (not both):
"effort": "high", // Can be "high", "medium", or "low" (OpenAI-style)
"max_tokens": 2000, // Specific token limit (Anthropic-style)
// Optional: Default is false. All models support this.
"exclude": false // Set to true to exclude reasoning tokens from response
}
}
The reasoning
config object consolidates settings for controlling reasoning strength across different models. See the Note for each option below to see which models are supported and how other models will behave.
For models that support reasoning token allocation, you can control it like this:
"max_tokens": 2000
- Directly specifies the maximum number of tokens to use for reasoningFor models that only support reasoning.effort
(see below), the max_tokens
value will be used to determine the effort level.
"effort": "high"
- Allocates a large portion of tokens for reasoning (approximately 80% of max_tokens)"effort": "medium"
- Allocates a moderate portion of tokens (approximately 50% of max_tokens)"effort": "low"
- Allocates a smaller portion of tokens (approximately 20% of max_tokens)For models that only support reasoning.max_tokens
, the effort level will be set based on the percentages above.
If you want the model to use reasoning internally but not include it in the response:
"exclude": true
- The model will still use reasoning, but it won’t be returned in the responseReasoning tokens will appear in the reasoning
field of each message.
For backward compatibility, OpenRouter still supports the following legacy parameters:
include_reasoning: true
- Equivalent to reasoning: {}
include_reasoning: false
- Equivalent to reasoning: { exclude: true }
However, we recommend using the new unified reasoning
parameter for better control and future compatibility.
<TemplatedCodeBlocks templates={{ API_KEY_REF, MODEL: “openai/o3-mini” }}
import requests
import json
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {{API_KEY_REF}}",
"Content-Type": "application/json"
}
payload = {
"model": "{{MODEL}}",
"messages": [
{"role": "user", "content": "How would you build the world's tallest skyscraper?"}
],
"reasoning": {
"effort": "high" # Use high reasoning effort
}
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()['choices'][0]['message']['reasoning'])
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: '{{API_KEY_REF}}',
});
async function getResponseWithReasoning() {
const response = await openai.chat.completions.create({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: "How would you build the world's tallest skyscraper?",
},
],
reasoning: {
effort: 'high', // Use high reasoning effort
},
});
console.log('REASONING:', response.choices[0].message.reasoning);
console.log('CONTENT:', response.choices[0].message.content);
}
getResponseWithReasoning();
For models that support direct token allocation (like Anthropic models), you can specify the exact number of tokens to use for reasoning:
<TemplatedCodeBlocks templates={{ API_KEY_REF, MODEL: “anthropic/claude-3.7-sonnet” }}
import requests
import json
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {{API_KEY_REF}}",
"Content-Type": "application/json"
}
payload = {
"model": "{{MODEL}}",
"messages": [
{"role": "user", "content": "What's the most efficient algorithm for sorting a large dataset?"}
],
"reasoning": {
"max_tokens": 2000 # Allocate 2000 tokens (or approximate effort) for reasoning
}
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()['choices'][0]['message']['reasoning'])
print(response.json()['choices'][0]['message']['content'])
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: '{{API_KEY_REF}}',
});
async function getResponseWithReasoning() {
const response = await openai.chat.completions.create({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: "How would you build the world's tallest skyscraper?",
},
],
reasoning: {
max_tokens: 2000, // Allocate 2000 tokens (or approximate effort) for reasoning
},
});
console.log('REASONING:', response.choices[0].message.reasoning);
console.log('CONTENT:', response.choices[0].message.content);
}
getResponseWithReasoning();
If you want the model to use reasoning internally but not include it in the response:
<TemplatedCodeBlocks templates={{ API_KEY_REF, MODEL: “deepseek/deepseek-r1” }}
import requests
import json
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {{API_KEY_REF}}",
"Content-Type": "application/json"
}
payload = {
"model": "{{MODEL}}",
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"reasoning": {
"effort": "high",
"exclude": true # Use reasoning but don't include it in the response
}
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
# No reasoning field in the response
print(response.json()['choices'][0]['message']['content'])
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: '{{API_KEY_REF}}',
});
async function getResponseWithReasoning() {
const response = await openai.chat.completions.create({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: "How would you build the world's tallest skyscraper?",
},
],
reasoning: {
effort: 'high',
exclude: true, // Use reasoning but don't include it in the response
},
});
console.log('REASONING:', response.choices[0].message.reasoning);
console.log('CONTENT:', response.choices[0].message.content);
}
getResponseWithReasoning();
This example shows how to use reasoning tokens in a more complex workflow. It injects one model’s reasoning into another model to improve its response quality:
<TemplatedCodeBlocks templates={{ API_KEY_REF, }}
import requests
import json
question = "Which is bigger: 9.11 or 9.9?"
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {{API_KEY_REF}}",
"Content-Type": "application/json"
}
def do_req(model, content, reasoning_config=None):
payload = {
"model": model,
"messages": [
{"role": "user", "content": content}
],
"stop": "</think>"
}
return requests.post(url, headers=headers, data=json.dumps(payload))
# Get reasoning from a capable model
content = f"{question} Please think this through, but don't output an answer"
reasoning_response = do_req("deepseek/deepseek-r1", content)
reasoning = reasoning_response.json()['choices'][0]['message']['reasoning']
# Let's test! Here's the naive response:
simple_response = do_req("openai/gpt-4o-mini", question)
print(simple_response.json()['choices'][0]['message']['content'])
# Here's the response with the reasoning token injected:
content = f"{question}. Here is some context to help you: {reasoning}"
smart_response = do_req("openai/gpt-4o-mini", content)
print(smart_response.json()['choices'][0]['message']['content'])
The latest Claude models, such as anthropic/claude-3.7-sonnet, support working with and returning reasoning tokens.
You can enable reasoning on Anthropic models in two ways:
:thinking
variant suffix (e.g., anthropic/claude-3.7-sonnet:thinking
). The thinking variant defaults to high effort.reasoning
parameter with either effort
or max_tokens
When using Anthropic models with reasoning:
reasoning.max_tokens
parameter, that value is used directly with a minimum of 1024 tokens.:thinking
variant suffix or the reasoning.effort
parameter, the budget_tokens are calculated based on the max_tokens
value.The reasoning token allocation is capped at 32,000 tokens maximum and 1024 tokens minimum. The formula for calculating the budget_tokens is: budget_tokens = max(min(max_tokens * {effort_ratio}, 32000), 1024)
effort_ratio is 0.8 for high effort, 0.5 for medium effort, and 0.2 for low effort.
Important: max_tokens
must be strictly higher than the reasoning budget to ensure there are tokens available for the final response after thinking.
<TemplatedCodeBlocks templates={{ API_KEY_REF, MODEL: “anthropic/claude-3.7-sonnet” }}
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="{{API_KEY_REF}}",
)
def chat_completion_with_reasoning(messages):
response = client.chat.completions.create(
model="{{MODEL}}",
messages=messages,
max_tokens=10000,
reasoning={
"max_tokens": 8000 # Directly specify reasoning token budget
},
stream=True
)
return response
for chunk in chat_completion_with_reasoning([
{"role": "user", "content": "What's bigger, 9.9 or 9.11?"}
]):
if hasattr(chunk.choices[0].delta, 'reasoning') and chunk.choices[0].delta.reasoning:
print(f"REASONING: {chunk.choices[0].delta.reasoning}")
elif chunk.choices[0].delta.content:
print(f"CONTENT: {chunk.choices[0].delta.content}")
Integrate OpenRouter using popular frameworks and SDKs. Complete guides for OpenAI SDK, LangChain, PydanticAI, and Vercel AI SDK integration.
You can find a few examples of using OpenRouter with other frameworks in this Github repository. Here are some examples:
pip install openai
: github.npm i openai
: github.
npx @getgrit/launcher openrouter
.
const openai = new OpenAI({ baseURL: “https://openrouter.ai/api/v1”, apiKey: “${API_KEY_REF}”, defaultHeaders: { ${getHeaderLines().join('\n ')} }, })
async function main() { const completion = await openai.chat.completions.create({ model: “${Model.GPT_4_Omni}”, messages: [ { role: “user”, content: “Say this is a test” } ], })
console.log(completion.choices[0].message)
} main();
```python title="Python"
from openai import OpenAI
from os import getenv
# gets API Key from environment variable OPENAI_API_KEY
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=getenv("OPENROUTER_API_KEY"),
)
completion = client.chat.completions.create(
model="${Model.GPT_4_Omni}",
extra_headers={
"HTTP-Referer": "<YOUR_SITE_URL>", # Optional. Site URL for rankings on openrouter.ai.
"X-Title": "<YOUR_SITE_NAME>", # Optional. Site title for rankings on openrouter.ai.
},
# pass extra_body to access OpenRouter-only arguments.
# extra_body={
# "models": [
# "${Model.GPT_4_Omni}",
# "${Model.Mixtral_8x_22B_Instruct}"
# ]
# },
messages=[
{
"role": "user",
"content": "Say this is a test",
},
],
)
print(completion.choices[0].message.content)
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from os import getenv
from dotenv import load_dotenv
load_dotenv()
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm = ChatOpenAI(
openai_api_key=getenv("OPENROUTER_API_KEY"),
openai_api_base=getenv("OPENROUTER_BASE_URL"),
model_name="<model_name>",
model_kwargs={
"headers": {
"HTTP-Referer": getenv("YOUR_SITE_URL"),
"X-Title": getenv("YOUR_SITE_NAME"),
}
},
)
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
print(llm_chain.run(question))
PydanticAI provides a high-level interface for working with various LLM providers, including OpenRouter.
pip install 'pydantic-ai-slim[openai]'
You can use OpenRouter with PydanticAI through its OpenAI-compatible interface:
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
model = OpenAIModel(
"anthropic/claude-3.5-sonnet", # or any other OpenRouter model
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-...",
)
agent = Agent(model)
result = await agent.run("What is the meaning of life?")
print(result)
For more details about using PydanticAI with OpenRouter, see the PydanticAI documentation.
You can use the Vercel AI SDK to integrate OpenRouter with your Next.js app. To get started, install @openrouter/ai-sdk-provider:
npm install @openrouter/ai-sdk-provider
And then you can use streamText() API to stream text from OpenRouter.
export const getLasagnaRecipe = async (modelName: string) => { const openrouter = createOpenRouter({ apiKey: ‘${API_KEY_REF}’, });
const result = await streamText({
model: openrouter(modelName),
prompt: 'Write a vegetarian lasagna recipe for 4 people.',
});
return result.toAIStreamResponse();
};
export const getWeather = async (modelName: string) => { const openrouter = createOpenRouter({ apiKey: ‘${API_KEY_REF}’, });
const result = await streamText({
model: openrouter(modelName),
prompt: 'What is the weather in San Francisco, CA in Fahrenheit?',
tools: {
getCurrentWeather: {
description: 'Get the current weather in a given location',
parameters: z.object({
location: z
.string()
.describe('The city and state, e.g. San Francisco, CA'),
unit: z.enum(['celsius', 'fahrenheit']).optional(),
}),
execute: async ({ location, unit = 'celsius' }) => {
// Mock response for the weather
const weatherData = {
'Boston, MA': {
celsius: '15°C',
fahrenheit: '59°F',
},
'San Francisco, CA': {
celsius: '18°C',
fahrenheit: '64°F',
},
};
const weather = weatherData[location];
if (!weather) {
return `Weather data for ${location} is not available.`;
}
return `The current weather in ${location} is ${weather[unit]}.`;
},
},
},
});
return result.toAIStreamResponse();
};
</CodeBlocks>
what field does the generation id come back in from the request