Perform real-time inference jobs＃

Overview＃

This guide provides guidance about how to use real-time inference with the kluster.ai API. This type of inference is best suited for use cases requiring instant, synchronous responses for user-facing features like chat interactions, live recommendations, or real-time decision-making.

You will learn how to submit a request and retrieve responses, and where to find integration guides for using kluster.ai's API with some of your favorite third-party LLM interfaces. Please make sure you check the API request limits.

Prerequisites＃

This guide assumes familiarity with Large Language Model (LLM) development and OpenAI libraries. Before getting started, make sure you have:

A kluster.ai account: Sign up on the kluster.ai platform if you don't have one.
A kluster.ai API key: After signing in, go to the API Keys section and create a new key. For detailed instructions, check out the Get an API key guide.
A virtual Python environment - (optional) recommended for developers using Python. It helps isolate Python installations in a virtual environment to reduce the risk of environment or package conflicts between your projects
Required Python libraries - install the following Python libraries:
- OpenAI Python API library - to access the openai module
- getpass - to handle API keys safely

If you plan to use cURL via the CLI, you can export kluster.ai API key as a variable:

export API_KEY=INSERT_API_KEY

Supported models＃

Please visit the Models page to learn more about all the models supported by the kluster.ai batch API.

In addition, you can see the complete list of available models programmatically using the list supported models endpoint.

Quickstart snippets＃

The following code snippets provide a complete end-to-end real-time inference example for different models supported by kluster.ai. You can copy and paste the snippet into your local environment.

Python＃

To use these snippets, run the Python script and enter your kluster.ai API key when prompted.

DeepSeek-R1

# Real-time completions with the DeepSeek-R1 model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

DeepSeek-R1-0528

# Real-time completions with the DeepSeek-R1-0528 model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-0528",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

DeepSeek-V3-0324

# Real-time completions with the DeepSeek-V3-0324 model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Gemma 3 27B

# Real-time completions with the Gemma 3 27B model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

image_url = "https://github.com/kluster-ai/klusterai-cookbook/blob/main/images/parking-image.jpeg?raw=true"

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(api_key=api_key, base_url="https://api.kluster.ai/v1")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Create chat completion request
completion = client.chat.completions.create(
    model="google/gemma-3-27b-it",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Who can park in the area?"},
                {"type": "image_url", "image_url": {"url": image_url}},
            ],
        }
    ],
)

print(f"\nImage URL: {image_url}")

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Magistral Small

# Real-time completions with the Magistral Small model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="mistralai/Magistral-Small-2506",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Meta Llama 3.1 8B

# Real-time completions with the Meta Llama 3.1 8B model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Meta Llama 3.3 70B

# Real-time completions with the Meta Llama 3.3 70B model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="klusterai/Meta-Llama-3.3-70B-Instruct-Turbo",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Meta Llama 4 Maverick

# Real-time completions with the Meta Llama 4 Maverick model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

image_url = "https://github.com/kluster-ai/klusterai-cookbook/blob/main/images/parking-image.jpeg?raw=true"

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(api_key=api_key, base_url="https://api.kluster.ai/v1")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Create chat completion request
completion = client.chat.completions.create(
    model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Who can park in the area?"},
                {"type": "image_url", "image_url": {"url": image_url}},
            ],
        }
    ],
)

print(f"\nImage URL: {image_url}")

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Meta Llama 4 Scout

# Real-time completions with the Meta Llama 4 Scout model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

image_url = "https://github.com/kluster-ai/klusterai-cookbook/blob/main/images/parking-image.jpeg?raw=true"

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(api_key=api_key, base_url="https://api.kluster.ai/v1")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Create chat completion request
completion = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Who can park in the area?"},
                {"type": "image_url", "image_url": {"url": image_url}},
            ],
        }
    ],
)

print(f"\nImage URL: {image_url}")

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Mistral NeMo

# Real-time completions with the Mistral NeMo model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="mistralai/Mistral-Nemo-Instruct-2407",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Mistral Small

# Real-time completions with the Mistral Small model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="mistralai/Mistral-Small-24B-Instruct-2501",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Qwen2.5-VL 7B

# Real-time completions with the Qwen2.5-VL 7B model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

image_url = "https://github.com/kluster-ai/klusterai-cookbook/blob/main/images/parking-image.jpeg?raw=true"

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(api_key=api_key, base_url="https://api.kluster.ai/v1")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Create chat completion request
completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-VL-7B-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Who can park in the area?"},
                {"type": "image_url", "image_url": {"url": image_url}},
            ],
        }
    ],
)

print(f"\nImage URL: {image_url}")

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

Qwen3-235B-A22B

# Real-time completions with the Qwen3-235B-A22B model on kluster.ai

from os import environ
from openai import OpenAI
from getpass import getpass

# Get API key from user input
api_key = environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

# Create chat completion request
completion = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-FP8",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

"""Logs the full AI response to terminal."""

# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content

# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)

CLI＃

Similarly, the following curl commands showcase how to easily send a chat completion request to kluster.ai for the different supported models. This example assumes you've exported your kluster.ai API key as the variable API_KEY.

DeepSeek-R1

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"deepseek-ai/DeepSeek-R1\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

DeepSeek-R1-0528

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"deepseek-ai/DeepSeek-R1-0528\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

DeepSeek-V3-0324

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"deepseek-ai/DeepSeek-V3-0324\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

Gemma 3 27B

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

image_url="https://github.com/kluster-ai/klusterai-cookbook/blob/main/images/parking-image.jpeg?raw=true"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
        \"model\": \"google/gemma-3-27b-it\",
        \"messages\": [
            {
                \"role\": \"user\",
                \"content\": [
                    {\"type\": \"text\", \"text\": \"Who can park in the area?\"},
                    {\"type\": \"image_url\", \"image_url\": {\"url\": \"$image_url\"}}
                ]
            }
        ]
    }"

Magistral Small

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"mistralai/Magistral-Small-2506\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

Meta Llama 3.1 8B

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"klusterai/Meta-Llama-3.1-8B-Instruct-Turbo\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

Meta Llama 3.3 70B

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"klusterai/Meta-Llama-3.3-70B-Instruct-Turbo\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

Meta Llama 4 Maverick

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

image_url="https://github.com/kluster-ai/klusterai-cookbook/blob/main/images/parking-image.jpeg?raw=true"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
        \"model\": \"meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\",
        \"messages\": [
            {
                \"role\": \"user\",
                \"content\": [
                    {\"type\": \"text\", \"text\": \"Who can park in the area?\"},
                    {\"type\": \"image_url\", \"image_url\": {\"url\": \"$image_url\"}}
                ]
            }
        ]
    }"

Meta Llama 4 Scout

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

image_url="https://github.com/kluster-ai/klusterai-cookbook/blob/main/images/parking-image.jpeg?raw=true"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
        \"model\": \"meta-llama/Llama-4-Scout-17B-16E-Instruct\",
        \"messages\": [
            {
                \"role\": \"user\",
                \"content\": [
                    {\"type\": \"text\", \"text\": \"Who can park in the area?\"},
                    {\"type\": \"image_url\", \"image_url\": {\"url\": \"$image_url\"}}
                ]
            }
        ]
    }"

Mistral NeMo

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"mistralai/Mistral-Nemo-Instruct-2407\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

Mistral Small

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"mistralai/Mistral-Small-24B-Instruct-2501\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

Qwen2.5-VL 7B

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

image_url="https://github.com/kluster-ai/klusterai-cookbook/blob/main/images/parking-image.jpeg?raw=true"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
        \"model\": \"Qwen/Qwen2.5-VL-7B-Instruct\",
        \"messages\": [
            {
                \"role\": \"user\",
                \"content\": [
                    {\"type\": \"text\", \"text\": \"Who can park in the area?\"},
                    {\"type\": \"image_url\", \"image_url\": {\"url\": \"$image_url\"}}
                ]
            }
        ]
    }"

Qwen3-235B-A22B

#!/bin/bash

# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
    echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi

echo -e "📤 Sending a chat completion request to kluster.ai...\n"

# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
            \"model\": \"Qwen/Qwen3-235B-A22B-FP8\", 
            \"messages\": [
                { 
                    \"role\": \"user\", 
                    \"content\": \"What is the ultimate breakfast sandwich?\"
                }
            ]
        }"

Real-time inference flow＃

This section details the real-time inference process using the kluster.ai API and DeepSeek R1 model, but you can adapt it to any of the supported models.

Submitting a request＃

The kluster.ai platform offers a simple, OpenAI-compatible interface, making it easy to integrate kluster.ai services seamlessly into your existing system.

The following code shows how to do a chat completions request using the OpenAI library.

Python

import os
from getpass import getpass

from openai import OpenAI

# Get API key from user input
api_key = os.environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Create chat completion request
completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

If successful, the completion variable contains a full response, which you'll need to analyze to extract the answer you are looking for. In terms of configuration for real-time inferences, there are several parameters that you need to tweak:

model string required - name of one of the supported models
messages array required - a list of chat messages (system, user, or assistant roles, and also image_url for images). In this example, the query is "What is the ultimate breakfast sandwich?".

Once these parameters are configured, run your script to send the request.

Fetching the response＃

If the request is successful, the response is contained in the completion variable from the example above. It should follow the structure below and include relevant data such as the generated output, metadata, and token usage details.

Response

{
    "id": "a3af373493654dd195108b207e2faacf",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The \"ultimate\" breakfast sandwich is subjective and can vary based on personal preferences, but here's a classic, crowd-pleasing version that combines savory, sweet, and hearty elements for a satisfying morning meal:\n\n### **The Ultimate Breakfast Sandwich**\n**Ingredients:**\n- **Bread:** A toasted brioche bun, English muffin, or sourdough slice (your choice for texture and flavor).\n- **Protein:** Crispy bacon, sausage patty, or ham.\n- **Egg:** Fried, scrambled, or a fluffy omelet-style egg.\n- **Cheese:** Sharp cheddar, gooey American, or creamy Swiss.\n- **Sauce:** Spicy mayo, hollandaise, or a drizzle of maple syrup for sweetness.\n- **Extras:** Sliced avocado, caramelized onions, sautéed mushrooms, or fresh arugula for a gourmet touch.\n- **Seasoning:** Salt, pepper, and a pinch of red pepper flakes for heat.\n\n**Assembly:**\n1. Toast your bread or bun to golden perfection.\n2. Cook your protein to your desired crispiness or doneness.\n3. Prepare your egg—fried with a runny yolk is a classic choice.\n4. Layer the cheese on the warm egg or protein so it melts slightly.\n5. Add your extras (avocado, veggies, etc.) for freshness and flavor.\n6. Spread your sauce on the bread or drizzle it over the filling.\n7. Stack everything together, season with salt, pepper, or spices, and enjoy!\n\n**Optional Upgrades:**\n- Add a hash brown patty for extra crunch.\n- Swap regular bacon for thick-cut or maple-glazed bacon.\n- Use a croissant instead of bread for a buttery, flaky twist.\n\nThe ultimate breakfast sandwich is all about balance—crunchy, creamy, savory, and a hint of sweetness. Customize it to your taste and make it your own!",
                "refusal": null,
                "role": "assistant",
                "audio": null,
                "function_call": null,
                "tool_calls": null
            },
            "matched_stop": 1
        }
    ],
    "created": 1742378836,
    "model": "deepseek-ai/DeepSeek-V3-0324",
    "object": "chat.completion",
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 398,
        "prompt_tokens": 10,
        "total_tokens": 408,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    }
}

The following snippet demonstrates how to extract the data, log it to the console, and save it to a JSON file.

Python

def log_response_to_file(response, filename="response_log.json"):
    """Logs the full AI response to a JSON file in the same directory as the script."""

    # Extract model name and AI-generated text
    model_name = response.model  
    text_response = response.choices[0].message.content  

    # Print response to console
    print(f"\n🔍 AI response (model: {model_name}):")
    print(text_response)

    # Convert response to dictionary
    response_data = response.model_dump()

    # Get the script directory
    script_dir = os.path.dirname(os.path.abspath(__file__))
    file_path = os.path.join(script_dir, filename)

    # Write to JSON file
    with open(file_path, "w", encoding="utf-8") as json_file:
        json.dump(response_data, json_file, ensure_ascii=False, indent=4)
        print(f"💾 Response saved to {file_path}")

# Log response to file
log_response_to_file(completion)

For a detailed breakdown of the chat completion object, see the chat completion API reference section.

View the complete script

Python

import json
import os
from getpass import getpass

from openai import OpenAI

# Get API key from user input
api_key = os.environ.get("API_KEY") or getpass("Enter your kluster.ai API key: ")

# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
    api_key=api_key,
    base_url="https://api.kluster.ai/v1"
)

print(f"📤 Sending a chat completion request to kluster.ai...\n")

# Create chat completion request
completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[
        {"role": "user", "content": "What is the ultimate breakfast sandwich?"}
    ]
)

def log_response_to_file(response, filename="response_log.json"):
    """Logs the full AI response to a JSON file in the same directory as the script."""

    # Extract model name and AI-generated text
    model_name = response.model  
    text_response = response.choices[0].message.content  

    # Print response to console
    print(f"\n🔍 AI response (model: {model_name}):")
    print(text_response)

    # Convert response to dictionary
    response_data = response.model_dump()

    # Get the script directory
    script_dir = os.path.dirname(os.path.abspath(__file__))
    file_path = os.path.join(script_dir, filename)

    # Write to JSON file
    with open(file_path, "w", encoding="utf-8") as json_file:
        json.dump(response_data, json_file, ensure_ascii=False, indent=4)
        print(f"💾 Response saved to {file_path}")

# Log response to file
log_response_to_file(completion)

Third-party integrations＃

You can also set up third-party LLM integrations using the kluster.ai API. For step-by-step instructions, check out the following integration guides:

SillyTavern - multi-LLM chat interface
LangChain - multi-turn conversational agent
eliza - create and manage AI agents
CrewAI - specialized agents for complex tasks
LiteLLM - streaming response and multi-turn conversation handling

Summary＃

You have now experienced the complete real-time inference job lifecycle using kluster.ai's chat completion API. In this guide, you've learned:

How to submit a real-rime inference request
How to configure real-time inference-related API parameters
How to interpret the chat completion object API response

The kluster.ai batch API is designed to efficiently and reliably handle your large-scale LLM workloads. If you have questions or suggestions, the support team would love to hear from you.