Start using the kluster.ai API＃

The kluster.ai API provides a straightforward way to work with Large Language Models (LLMs) at scale. It is compatible with OpenAI's API and SDKs, making it easy to integrate into your existing workflows with minimal code changes.

Get your API key＃

Navigate to the kluster.ai developer console API Keys section and create a new key. You'll need this for all API requests.

For step-by-step instructions, refer to the Get an API key guide.

Set up the OpenAI client library＃

Developers can use the OpenAI libraries with kluster.ai with no changes. To start, you need to install the library:

Python

pip install "openai>=1.0.0"

Once the library is installed, you can instantiate an OpenAI client pointing to kluster.ai with the following code and replacing INSERT_API_KEY:

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.kluster.ai/v1",
    api_key="INSERT_API_KEY",  # Replace with your actual API key
)

Check the kluster.ai OpenAI compatibility page for detailed information about the integration.

API request limits＃

The following limits apply to API requests based on your plan:

TrialCoreScaleEnterprise

Model	Context size [tokens]	Max output [tokens]	Max batch requests	Concurrent requests	Requests per minute	Hosted fine-tuned models
DeepSeek-R1-0528	32k	4k	1000	20	30	1
DeepSeek-V3-0324	32k	4k	1000	20	30	1
Gemma 3 27B	32k	4k	1000	20	30	1
Magistral Small	32k	4k	1000	20	30	1
Meta Llama 3.1 8B	32k	4k	1000	20	30	1
Meta Llama 3.3 70B	32k	4k	1000	20	30	1
Meta Llama 4 Maverick	32k	4k	1000	20	30	1
Meta Llama 4 Scout	32k	4k	1000	20	30	1
Mistral NeMo	32k	4k	1000	20	30	1
Mistral Small	32k	4k	1000	20	30	1
Qwen2.5-VL 7B	32k	4k	1000	20	30	1
Qwen3-235B-A22B	32k	4k	1000	20	30	1
kluster reliability check	32k	4k	1000	20	30	1

Model	Context size [tokens]	Max output [tokens]	Max batch requests	Concurrent requests	Requests per minute	Hosted fine-tuned models
DeepSeek-R1-0528	163k	163k	100k	100	600	10
DeepSeek-V3-0324	163k	163k	100k	100	600	10
Gemma 3 27B	64k	8k	100k	100	600	10
Magistral Small	40k	40k	100k	100	600	10
Meta Llama 3.1 8B	131k	131k	100k	100	600	10
Meta Llama 3.3 70B	131k	131k	100k	100	600	10
Meta Llama 4 Maverick	1M	1M	100k	100	600	10
Meta Llama 4 Scout	131k	131k	100k	100	600	10
Mistral NeMo	131k	131k	100k	100	600	10
Mistral Small	32k	32k	100k	100	600	10
Qwen2.5-VL 7B	32k	32k	100k	100	600	10
Qwen3-235B-A22B	40k	40k	100k	100	600	10
kluster reliability check	100k	0	100k	100	600	10

Model	Context size [tokens]	Max output [tokens]	Max batch requests	Concurrent requests	Requests per minute	Hosted fine-tuned models
DeepSeek-R1-0528	163k	163k	500k	100	1200	25
DeepSeek-V3-0324	163k	163k	500k	100	1200	25
Gemma 3 27B	64k	8k	500k	100	1200	25
Magistral Small	40k	40k	500k	100	1200	25
Meta Llama 3.1 8B	131k	131k	500k	100	1200	25
Meta Llama 3.3 70B	131k	131k	500k	100	1200	25
Meta Llama 4 Maverick	1M	1M	500k	100	1200	25
Meta Llama 4 Scout	131k	131k	500k	100	1200	25
Mistral NeMo	131k	131k	500k	100	1200	25
Mistral Small	32k	32k	500k	100	1200	25
Qwen2.5-VL 7B	32k	32k	500k	100	1200	25
Qwen3-235B-A22B	40k	40k	500k	100	1200	25
kluster reliability check	100k	0	500k	100	1200	25

Model	Context size [tokens]	Max output [tokens]	Max batch requests	Concurrent requests	Requests per minute	Hosted fine-tuned models
DeepSeek-R1-0528	163k	163k	Unlimited	100	Unlimited	Unlimited
DeepSeek-V3-0324	163k	163k	Unlimited	100	Unlimited	Unlimited
Gemma 3 27B	64k	8k	Unlimited	100	Unlimited	Unlimited
Magistral Small	40k	40k	Unlimited	100	Unlimited	Unlimited
Meta Llama 3.1 8B	131k	131k	Unlimited	100	Unlimited	Unlimited
Meta Llama 3.3 70B	131k	131k	Unlimited	100	Unlimited	Unlimited
Meta Llama 4 Maverick	1M	1M	Unlimited	100	Unlimited	Unlimited
Meta Llama 4 Scout	131k	131k	Unlimited	100	Unlimited	Unlimited
Mistral NeMo	131k	131k	Unlimited	100	Unlimited	Unlimited
Mistral Small	32k	32k	Unlimited	100	Unlimited	Unlimited
Qwen2.5-VL 7B	32k	32k	Unlimited	100	Unlimited	Unlimited
Qwen3-235B-A22B	40k	40k	Unlimited	100	Unlimited	Unlimited
kluster reliability check	100k	0	Unlimited	100	Unlimited	Unlimited

Where to go next＃

Guide Real-time inference

Build AI-powered applications that deliver instant, real-time responses.

Visit the guide
Guide Batch inference

Process large-scale data efficiently with AI-powered batch inference.

Visit the guide
Reference API reference

Explore the complete kluster.ai API documentation and usage details.

Reference