Getting started with the kluster.ai API¶
Welcome to the kluster.ai getting started notebook!
kluster.ai is a high-performance platform designed to make large-scale AI workloads accessible, efficient, and affordable. Our Batch API is an asynchronous service with higher rate limits, predictable turnaround times, and unmatched value. It enables a variety of use cases such as summarization, classification, translation, and much more, all without the need to manage infrastructure.
This notebook is designed to help you get started quickly. It walks you through the essential code snippets from the Getting started guide, all in one place.
By running this notebook, you’ll:
- Learn how to use the API.
- Submit a simple batch request using our open source LLMs.
- Understand how to handle and interpret the API’s responses.
Setup¶
This step ensures that the openai Python library is installed or updated to the required version. This library will serve as the client for interacting with the kluster.ai API.
pip install -q "openai>=1.0.0"
Note: you may need to restart the kernel to use updated packages.
Creating inference jobs as JSONL files¶
This step defines a collection of requests for the API to process. Each request includes a unique identifier (custom_id
), the HTTP method (POST
), the chat completions endpoint (/v1/chat/completions
) and a body field that contains the request you want to send to the chat completions endpoint toghether with the model
to be used and the conversational context ("messages"). These tasks are saved as a JSON Lines (.jsonl
) file for efficient handling of multiple requests in a single upload.
You'll have to enter your personal kluster.ai API key (make sure it has no blank spaces). Remember to create a key in platform.kluster.ai, if you don't have one yet.
from openai import OpenAI
import json
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY", # Replace with your actual API key
)
tasks = [{
"custom_id": "request-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Argentina?"},
],
"max_tokens": 1000,
},
},
{
"custom_id": "request-2",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "klusterai/Meta-Llama-3.1-70B-Instruct-Turbo",
"messages": [
{"role": "system", "content": "You are a maths tutor."},
{"role": "user", "content": "Explain the Pythagorean theorem."},
],
"max_tokens": 1000,
},
}
# Additional tasks can be added here
]
# Save tasks to a JSONL file (newline-delimited JSON)
file_name = "my_inference_test.jsonl"
with open(file_name, "w") as file:
for task in tasks:
file.write(json.dumps(task) + "\n")
Uploading Batch inference job files¶
This step uploads the input JSONL file to kluster.ai via the API. Once the file is uploaded, the API assigns a unique file ID. This ID is essential for subsequent steps, as it allows you to specify which file the batch job should use for processing
inference_input_file = client.files.create(
file=open(file_name, "rb"),
purpose="batch"
)
inference_input_file.to_dict()
{'id': '67533eff1d7be1b5a7507b75', 'bytes': 602, 'created_at': 1733508863, 'filename': '6750b85c7da9ad513c97bea1/02d026ed-0742-46a8-9ff9-8fdaa5ea768f-4e3e0531-deab-4489-8da1-5df2466bd176', 'object': 'file', 'purpose': 'batch'}
Submit your Batch job¶
This step starts your job by providing the uploaded file ID and setting the endpoint and completion window, initiating the batch inference process.
inference_request = client.batches.create(
input_file_id=inference_input_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
)
inference_request.to_dict()
{'id': '67533f6d1d7be1b5a7507be5', 'completion_window': '24h', 'created_at': 1733508973, 'endpoint': '/v1/chat/completions', 'input_file_id': '67533eff1d7be1b5a7507b75', 'object': 'batch', 'status': 'pre_schedule', 'completed_at': None, 'errors': [], 'expires_at': 1733595373, 'failed_at': None, 'finalizing_at': None, 'in_progress_at': None, 'metadata': None, 'request_counts': {'completed': 0, 'failed': 0, 'total': 0}}
Monitor job progress¶
In this step, the job status is checked repeatedly to track its progress. You’ll see updates on the overall status and the number of completed tasks until the job is finished, failed, or cancelled.
import time
# Poll the job's status until it's complete
while True:
inference_status = client.batches.retrieve(inference_request.id)
print("Job status: {}".format(inference_status.status))
print(
f"Completed tasks: {inference_status.request_counts.completed} / {inference_status.request_counts.total}"
)
if inference_status.status.lower() in ["completed", "failed", "cancelled"]:
break
time.sleep(10) # Wait for 10 seconds before checking again
inference_status.to_dict()
Job status: in_progress Completed tasks: 1 / 2 Job status: in_progress Completed tasks: 1 / 2 Job status: in_progress Completed tasks: 1 / 2 Job status: completed Completed tasks: 2 / 2
{'id': '67533f6d1d7be1b5a7507be5', 'completion_window': '24h', 'created_at': 1733508973, 'endpoint': '/v1/chat/completions', 'input_file_id': '67533eff1d7be1b5a7507b75', 'object': 'batch', 'status': 'completed', 'completed_at': 1733509004, 'errors': [], 'expires_at': 1733595373, 'failed_at': None, 'finalizing_at': 1733509004, 'in_progress_at': 1733508973, 'metadata': None, 'output_file_id': '67533f8c2ab49d0df6b4a583', 'request_counts': {'completed': 2, 'failed': 0, 'total': 2}}
Retrieve results¶
In this step, the results of the job are retrieved if it completed successfully. The output is downloaded and saved to a local file for you to review. If the job failed, the status will indicate the issue.
# Check if the job completed successfully
if inference_status.status.lower() == "completed":
# Retrieve the results
result_file_id = inference_status.output_file_id
results = client.files.content(result_file_id).content
# Save results to a file
result_file_name = "inference_results.jsonl"
with open(result_file_name, "wb") as file:
file.write(results)
print(f"Results saved to {result_file_name}")
else:
print(f"Job failed with status: {inference_status.status}")
Results saved to inference_results.jsonl
List all Batch jobs¶
This step lists the most recent jobs, providing an overview of their statuses and details.
client.batches.list(limit=2).to_dict()
{'data': [{'id': '67533f6d1d7be1b5a7507be5', 'completion_window': '24h', 'created_at': 1733508973, 'endpoint': '/v1/chat/completions', 'input_file_id': '67533eff1d7be1b5a7507b75', 'object': 'batch', 'status': 'completed', 'completed_at': 1733509004, 'errors': [], 'expires_at': 1733595373, 'failed_at': None, 'finalizing_at': 1733509004, 'in_progress_at': 1733508973, 'metadata': None, 'output_file_id': '67533f8c2ab49d0df6b4a583', 'request_counts': {'completed': 2, 'failed': 0, 'total': 2}}, {'id': '67533f041d7be1b5a7507b7b', 'completion_window': '24h', 'created_at': 1733508868, 'endpoint': '/v1/chat/completions', 'input_file_id': '67533eff1d7be1b5a7507b75', 'object': 'batch', 'status': 'completed', 'completed_at': 1733508903, 'errors': [], 'expires_at': 1733595268, 'failed_at': None, 'finalizing_at': 1733508903, 'in_progress_at': 1733508868, 'metadata': None, 'output_file_id': '67533f272ab49d0df6b4a561', 'request_counts': {'completed': 2, 'failed': 0, 'total': 2}}], 'object': 'list', 'first_id': '67533f6d1d7be1b5a7507be5', 'last_id': '67533f041d7be1b5a7507b7b', 'has_more': True}
Cancelling a Batch job¶
To cancel a job that is currently in progress, invoke the cancel endpoint by providing the request ID.
client.batches.cancel(inference_request.id)
List supported models¶
Find the right model for your job by first checking the list models endpoint. Choose from our range of models, optimized for different performance needs.
client.models.list().to_dict()
{'data': [{'id': 'klusterai/Meta-Llama-3.1-405B-Instruct-Turbo', 'created': 1731336418, 'object': 'model', 'owned_by': 'klusterai'}, {'id': 'klusterai/Meta-Llama-3.1-70B-Instruct-Turbo', 'created': 1731336610, 'object': 'model', 'owned_by': 'klusterai'}, {'id': 'klusterai/Meta-Llama-3.1-8B-Instruct-Turbo', 'created': 1731336610, 'object': 'model', 'owned_by': 'klusterai'}], 'object': 'list'}