Skip to content

Start using the kluster.ai API

The kluster.ai API provides a straightforward way to work with Large Language Models (LLMs) at scale. It is compatible with OpenAI's API and SDKs, making it easy to integrate into your existing workflows with minimal code changes.

Get your API key

Navigate to the kluster.ai developer console API Keys section and create a new key. You'll need this for all API requests.

For step-by-step instructions, refer to the Get an API key guide.

Set up the OpenAI client library

Developers can use the OpenAI libraries with kluster.ai with no changes. To start, you need to install the library:

pip install "openai>=1.0.0"

Once the library is installed, you can instantiate an OpenAI client pointing to kluster.ai with the following code and replacing INSERT_API_KEY:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.kluster.ai/v1",
    api_key="INSERT_API_KEY",  # Replace with your actual API key
)

Check the kluster.ai OpenAI compatibility page for detailed information about the integration.

API request limits

The following limits apply to API requests based on your plan:

Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek R1 32k 4k 1000 20 30 1
DeepSeek V3 32k 4k 1000 20 30 1
DeepSeek V3 0324 32k 4k 1000 20 30 1
Gemma 3 27B 32k 4k 1000 20 30 1
Llama 3.1 8B 32k 4k 1000 20 30 1
Llama 3.1 405B 32k 4k 1000 20 30 1
Llama 3.3 70B 32k 4k 1000 20 30 1
Llama 4 Maverick 17B 128E 32k 4k 1000 20 30 1
Llama 4 Scout 17B 16E 32k 4k 1000 20 30 1
Qwen 2.5 7B 32k 4k 1000 20 30 1
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek R1 164k 164k 100k 100 600 10
DeepSeek V3 164k 164k 100k 100 600 10
DeepSeek V3 0324 164k 164k 100k 100 600 10
Gemma 3 27B 64k 8k 100k 100 600 10
Llama 3.1 8B 131k 131k 100k 100 600 10
Llama 3.1 405B 131k 131k 100k 100 600 10
Llama 3.3 70B 131k 131k 100k 100 600 10
Llama 4 Maverick 17B 128E 1M 1M 100k 100 600 10
Llama 4 Scout 17B 16E 131k 131k 100k 100 600 10
Qwen 2.5 7B 32k 32k 100k 100 600 10
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek R1 164k 164k 500k 100 1200 25
DeepSeek V3 164k 164k 500k 100 1200 25
DeepSeek V3 0324 164k 164k 500k 100 1200 25
Gemma 3 27B 64k 8k 500k 100 1200 25
Llama 3.1 8B 131k 131k 500k 100 1200 25
Llama 3.1 405B 131k 131k 500k 100 1200 25
Llama 3.3 70B 131k 131k 500k 100 1200 25
Llama 4 Maverick 17B 128E 1M 1M 500k 100 1200 25
Llama 4 Scout 17B 16E 131k 131k 500k 100 1200 25
Qwen 2.5 7B 32k 32k 500k 100 1200 25
Model Context size
[tokens]
Max output
[tokens]
Max batch
requests
Concurrent
requests
Requests
per minute
Hosted fine-tuned
models
DeepSeek R1 164k 164k Unlimited 100 Unlimited Unlimited
DeepSeek V3 164k 164k Unlimited 100 Unlimited Unlimited
DeepSeek V3 0324 164k 164k Unlimited 100 Unlimited Unlimited
Gemma 3 27B 64k 8k Unlimited 100 Unlimited Unlimited
Llama 3.1 8B 131k 131k Unlimited 100 Unlimited Unlimited
Llama 3.1 405B 131k 131k Unlimited 100 Unlimited Unlimited
Llama 3.3 70B 131k 131k Unlimited 100 Unlimited Unlimited
Llama 4 Maverick 17B 128E 1M 1M Unlimited 100 Unlimited Unlimited
Llama 4 Scout 17B 16E 131k 131k Unlimited 100 Unlimited Unlimited
Qwen 2.5 7B 32k 32k Unlimited 100 Unlimited Unlimited

Where to go next

  • Guide Real-time inference


    Build AI-powered applications that deliver instant, real-time responses.

    Visit the guide

  • Guide Batch inference


    Process large-scale data efficiently with AI-powered batch inference.

    Visit the guide

  • Reference API reference


    Explore the complete kluster.ai API documentation and usage details.

    Reference