Perform real-time inference jobs#
Overview#
This guide provides guidance about how to use real-time inference with the kluster.ai API. This type of inference is best suited for use cases requiring instant, synchronous responses for user-facing features like chat interactions, live recommendations, or real-time decision-making.
You will learn how to submit a request and retrieve responses, and where to find integration guides for using kluster.ai's API with some of your favorite third-party LLM interfaces. Please make sure you check the API request limits.
Prerequisites#
This guide assumes familiarity with Large Language Model (LLM) development and OpenAI libraries. Before getting started, make sure you have:
- A kluster.ai account - sign up on the kluster.ai platform if you don't have one
- A kluster.ai API key - after signing in, go to the API Keys section and create a new key. For detailed instructions, check out the Get an API key guide
- A virtual Python environment - (optional) recommended for developers using Python. It helps isolate Python installations in a virtual environment to reduce the risk of environment or package conflicts between your projects
- Required Python libraries - install the following Python libraries:
- OpenAI Python API library - to access the
openai
module getpass
- to handle API keys safely
- OpenAI Python API library - to access the
If you plan to use cURL via the CLI, you can export kluster.ai API key as a variable:
export API_KEY=INSERT_API_KEY
Supported models#
Please visit the Models page to learn more about all the models supported by the kluster.ai batch API.
In addition, you can see the complete list of available models programmatically using the list supported models endpoint.
Quickstart snippets#
The following code snippets provide a complete end-to-end real-time inference example for different models supported by kluster.ai. You can copy and paste the snippet into your local environment.
Python#
To use these snippets, run the Python script and enter your kluster.ai API key when prompted.
DeepSeek R1
from openai import OpenAI
from getpass import getpass
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
api_key=api_key,
base_url="https://api.kluster.ai/v1"
)
# Create chat completion request
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=[
{"role": "user", "content": "What is the ultimate breakfast sandwich?"}
]
)
"""Logs the full AI response to terminal."""
# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
DeepSeek V3
from openai import OpenAI
from getpass import getpass
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
api_key=api_key,
base_url="https://api.kluster.ai/v1"
)
# Create chat completion request
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[
{"role": "user", "content": "What is the ultimate breakfast sandwich?"}
]
)
"""Logs the full AI response to terminal."""
# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
DeepSeek V3 0324
from openai import OpenAI
from getpass import getpass
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
api_key=api_key,
base_url="https://api.kluster.ai/v1"
)
# Create chat completion request
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=[
{"role": "user", "content": "What is the ultimate breakfast sandwich?"}
]
)
"""Logs the full AI response to terminal."""
# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
Gemma 3 27B
from openai import OpenAI
from getpass import getpass
image_url="https://github.com/kluster-ai/docs/blob/main/images/get-started/start-building/parking-image.jpg?raw=true"
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(api_key=api_key, base_url="https://api.kluster.ai/v1")
# Create chat completion request
completion = client.chat.completions.create(
model="google/gemma-3-27b-it",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Who can park in the area?"},
{"type": "image_url", "image_url": {"url": image_url}},
],
}
],
)
print(f"\nImage URL: {image_url}")
"""Logs the full AI response to terminal."""
# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
LLama 3.1 8B
from openai import OpenAI
from getpass import getpass
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
api_key=api_key,
base_url="https://api.kluster.ai/v1"
)
# Create chat completion request
completion = client.chat.completions.create(
model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[
{"role": "user", "content": "What is the ultimate breakfast sandwich?"}
]
)
"""Logs the full AI response to terminal."""
# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
LLama 3.1 405B
from openai import OpenAI
from getpass import getpass
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
api_key=api_key,
base_url="https://api.kluster.ai/v1"
)
# Create chat completion request
completion = client.chat.completions.create(
model="klusterai/Meta-Llama-3.1-405B-Instruct-Turbo",
messages=[
{"role": "user", "content": "What is the ultimate breakfast sandwich?"}
]
)
"""Logs the full AI response to terminal."""
# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
LLama 3.3 70B
from openai import OpenAI
from getpass import getpass
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
api_key=api_key,
base_url="https://api.kluster.ai/v1"
)
# Create chat completion request
completion = client.chat.completions.create(
model="klusterai/Meta-Llama-3.3-70B-Instruct-Turbo",
messages=[
{"role": "user", "content": "What is the ultimate breakfast sandwich?"}
]
)
"""Logs the full AI response to terminal."""
# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
Qwen 2.5 7B
from openai import OpenAI
from getpass import getpass
image_url="https://github.com/kluster-ai/docs/blob/main/images/get-started/start-building/parking-image.jpg?raw=true"
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(api_key=api_key, base_url="https://api.kluster.ai/v1")
# Create chat completion request
completion = client.chat.completions.create(
model="Qwen/Qwen2.5-VL-7B-Instruct",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Who can park in the area?"},
{"type": "image_url", "image_url": {"url": image_url}},
],
}
],
)
print(f"\nImage URL: {image_url}")
"""Logs the full AI response to terminal."""
# Extract model name and AI-generated text
model_name = completion.model
text_response = completion.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
CLI#
Similarly, the following curl commands showcase how to easily send a chat completion request to kluster.ai for the different supported models. This example assumes you've exported your kluster.ai API key as the variable API_KEY
.
DeepSeek R1
#!/bin/bash
# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi
# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"deepseek-ai/DeepSeek-R1\",
\"messages\": [
{
\"role\": \"user\",
\"content\": \"What is the ultimate breakfast sandwich?\"
}
]
}"
DeepSeek V3
#!/bin/bash
# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi
# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"deepseek-ai/DeepSeek-V3\",
\"messages\": [
{
\"role\": \"user\",
\"content\": \"What is the ultimate breakfast sandwich?\"
}
]
}"
DeepSeek V3 0324
#!/bin/bash
# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi
# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"deepseek-ai/DeepSeek-V3-0324\",
\"messages\": [
{
\"role\": \"user\",
\"content\": \"What is the ultimate breakfast sandwich?\"
}
]
}"
Gemma 3 27B
#!/bin/bash
# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi
image_url="https://github.com/kluster-ai/docs/blob/main/images/get-started/start-building/parking-image.jpg?raw=true"
# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"google/gemma-3-27b-it\",
\"messages\": [
{
\"role\": \"user\",
\"content\": [
{\"type\": \"text\", \"text\": \"Who can park in the area?\"},
{\"type\": \"image_url\", \"image_url\": {\"url\": \"$image_url\"}}
]
}
]
}"
LLama 3.1 8B
#!/bin/bash
# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi
# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"klusterai/Meta-Llama-3.1-8B-Instruct-Turbo\",
\"messages\": [
{
\"role\": \"user\",
\"content\": \"What is the ultimate breakfast sandwich?\"
}
]
}"
LLama 3.1 405B
#!/bin/bash
# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi
# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"klusterai/Meta-Llama-3.1-405B-Instruct-Turbo\",
\"messages\": [
{
\"role\": \"user\",
\"content\": \"What is the ultimate breakfast sandwich?\"
}
]
}"
LLama 3.3 70B
#!/bin/bash
# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi
# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"klusterai/Meta-Llama-3.3-70B-Instruct-Turbo\",
\"messages\": [
{
\"role\": \"user\",
\"content\": \"What is the ultimate breakfast sandwich?\"
}
]
}"
Qwen 2.5 7B
#!/bin/bash
# Check if API_KEY is set and not empty
if [[ -z "$API_KEY" ]]; then
echo -e "\nError: API_KEY environment variable is not set.\n" >&2
fi
image_url="https://github.com/kluster-ai/docs/blob/main/images/get-started/start-building/parking-image.jpg?raw=true"
# Submit real-time request
curl https://api.kluster.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"Qwen/Qwen2.5-VL-7B-Instruct\",
\"messages\": [
{
\"role\": \"user\",
\"content\": [
{\"type\": \"text\", \"text\": \"Who can park in the area?\"},
{\"type\": \"image_url\", \"image_url\": {\"url\": \"$image_url\"}}
]
}
]
}"
Real-time inference flow#
This section details the real-time inference process using the kluster.ai API and DeepSeek R1 model, but you can adapt it to any of the supported models.
Submitting a request#
The kluster.ai platform offers a simple, OpenAI-compatible interface, making it easy to integrate kluster.ai services seamlessly into your existing system.
The following code shows how to do a chat completions request using the OpenAI library.
from getpass import getpass
import json
import os
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
api_key=api_key,
base_url="https://api.kluster.ai/v1"
)
# Create chat completion request
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[
{"role": "user", "content": "What is the ultimate breakfast sandwich?"}
]
)
If successful, the completion
variable contains a full response, which you'll need to analyze to extract the answer you are looking for. In terms of configuration for real-time inferences, there are several parameters that you need to tweak:
model
string required - name of one of the supported modelsmessages
array required - a list of chat messages (system
,user
, orassistant
roles, and alsoimage_url
for images). In this example, the query is "What is the ultimate breakfast sandwich?".
Once these parameters are configured, run your script to send the request.
Fetching the response#
If the request is successful, the response is contained in the completion
variable from the example above. It should follow the structure below and include relevant data such as the generated output, metadata, and token usage details.
{
"id": "a3af373493654dd195108b207e2faacf",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "The \"ultimate\" breakfast sandwich is subjective and can vary based on personal preferences, but here’s a classic, crowd-pleasing version that combines savory, sweet, and hearty elements for a satisfying morning meal:\n\n### **The Ultimate Breakfast Sandwich**\n**Ingredients:**\n- **Bread:** A toasted brioche bun, English muffin, or sourdough slice (your choice for texture and flavor).\n- **Protein:** Crispy bacon, sausage patty, or ham.\n- **Egg:** Fried, scrambled, or a fluffy omelet-style egg.\n- **Cheese:** Sharp cheddar, gooey American, or creamy Swiss.\n- **Sauce:** Spicy mayo, hollandaise, or a drizzle of maple syrup for sweetness.\n- **Extras:** Sliced avocado, caramelized onions, sautéed mushrooms, or fresh arugula for a gourmet touch.\n- **Seasoning:** Salt, pepper, and a pinch of red pepper flakes for heat.\n\n**Assembly:**\n1. Toast your bread or bun to golden perfection.\n2. Cook your protein to your desired crispiness or doneness.\n3. Prepare your egg—fried with a runny yolk is a classic choice.\n4. Layer the cheese on the warm egg or protein so it melts slightly.\n5. Add your extras (avocado, veggies, etc.) for freshness and flavor.\n6. Spread your sauce on the bread or drizzle it over the filling.\n7. Stack everything together, season with salt, pepper, or spices, and enjoy!\n\n**Optional Upgrades:**\n- Add a hash brown patty for extra crunch.\n- Swap regular bacon for thick-cut or maple-glazed bacon.\n- Use a croissant instead of bread for a buttery, flaky twist.\n\nThe ultimate breakfast sandwich is all about balance—crunchy, creamy, savory, and a hint of sweetness. Customize it to your taste and make it your own!",
"refusal": null,
"role": "assistant",
"audio": null,
"function_call": null,
"tool_calls": null
},
"matched_stop": 1
}
],
"created": 1742378836,
"model": "deepseek-ai/DeepSeek-V3",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": null,
"usage": {
"completion_tokens": 398,
"prompt_tokens": 10,
"total_tokens": 408,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}
The following snippet demonstrates how to extract the data, log it to the console, and save it to a JSON file.
def log_response_to_file(response, filename="response_log.json"):
"""Logs the full AI response to a JSON file in the same directory as the script."""
# Extract model name and AI-generated text
model_name = response.model
text_response = response.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
# Convert response to dictionary
response_data = response.model_dump()
# Get the script directory
script_dir = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(script_dir, filename)
# Write to JSON file
with open(file_path, "w", encoding="utf-8") as json_file:
json.dump(response_data, json_file, ensure_ascii=False, indent=4)
print(f"💾 Response saved to {file_path}")
# Log response to file
log_response_to_file(completion)
For a detailed breakdown of the chat completion object, see the chat completion API reference section.
View the complete script
from openai import OpenAI
from getpass import getpass
import json
import os
# Get API key from user input
api_key = getpass("Enter your kluster.ai API key: ")
# Initialize OpenAI client pointing to kluster.ai API
client = OpenAI(
api_key=api_key,
base_url="https://api.kluster.ai/v1"
)
# Create chat completion request
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[
{"role": "user", "content": "What is the ultimate breakfast sandwich?"}
]
)
def log_response_to_file(response, filename="response_log.json"):
"""Logs the full AI response to a JSON file in the same directory as the script."""
# Extract model name and AI-generated text
model_name = response.model
text_response = response.choices[0].message.content
# Print response to console
print(f"\n🔍 AI response (model: {model_name}):")
print(text_response)
# Convert response to dictionary
response_data = response.model_dump()
# Get the script directory
script_dir = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(script_dir, filename)
# Write to JSON file
with open(file_path, "w", encoding="utf-8") as json_file:
json.dump(response_data, json_file, ensure_ascii=False, indent=4)
print(f"💾 Response saved to {file_path}")
# Log response to file
log_response_to_file(completion)
Third-party integrations#
You can also set up third-party LLM integrations using the kluster.ai API. For step-by-step instructions, check out the following integration guides:
- SillyTavern - multi-LLM chat interface
- LangChain - multi-turn conversational agent
- eliza - create and manage AI agents
- CrewAI - specialized agents for complex tasks
- LiteLLM - streaming response and multi-turn conversation handling
Summary#
You have now experienced the complete real-time inference job lifecycle using kluster.ai's chat completion API. In this guide, you've learned:
- How to submit a real-rime inference request
- How to configure real-time inference-related API parameters
- How to interpret the chat completion object API response
The kluster.ai batch API is designed to efficiently and reliably handle your large-scale LLM workloads. If you have questions or suggestions, the support team would love to hear from you.