API reference#
Chat#
Create chat completion#
POST https://api.kluster.ai/v1/chat/completions
To create a chat completion, send a request to the chat/completions
endpoint.
Request
model
string required
ID of the model to use. You can use the models
endpoint to retrieve the list of supported models.
messages
array required
A list of messages comprising the conversation so far. The messages
object can be one of system
, user
, or assistant
.
Show possible types
System message object
Show properties
content
string or array
The contents of the system message.
role
string or null required
The role of the messages author, in this case, system
.
User message object
Show properties
content
string or array
The contents of the user message.
role
string or null required
The role of the messages author, in this case, user
.
Assistant message object
Show properties
content
string or array
The contents of the assistant message.
role
string or null required
The role of the messages author, in this case, assistant
.
frequency_penalty
number or null
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood of repeating the same line verbatim. Defaults to 0
.
logit_bias
map
Modify the likelihood of specified tokens appearing in the completion. Defaults to null
.
Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase the likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
logprobs
boolean or null
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content
of message
. Defaults to false
.
top_logprobs
integer or null
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs
must be set to true
if this parameter is used.
max_completion_tokens
integer or null
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
presence_penalty
number or null
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Defaults to 0
.
seed
integer or null
If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed
and parameters should return the same result. Determinism is not guaranteed.
stop
string or array or null
Up to four sequences where the API will stop generating further tokens. Defaults to null
.
stream
boolean or null
If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE]
message. Defaults to false
.
temperature
number or null
The sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 1
.
It is generally recommended to alter this or top_p
but not both.
top_p
number or null
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Defaults to 1
.
It is generally recommended to alter this or temperature
but not both.
Returns
The created Chat completion object.
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY" # Replace with your actual API key
)
chat_completion = client.chat.completions.create(
model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Argentina?"},
],
)
print(chat_completion.to_dict())
curl -s https://api.kluster.ai/v1/chat/completions \
-H "Authorization: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Argentina?"
}
]
}'
{
"id": "chat-d187c103e189483485b3bcd3eb899c62",
"object": "chat.completion",
"created": 1736136422,
"model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Argentina is Buenos Aires.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 48,
"total_tokens": 57,
"completion_tokens": 9
},
"prompt_logprobs": null
}
Chat completion object#
id
string
Unique identifier for the chat completion.
object
string
The object type, which is always chat.completion
.
created
integer
The Unix timestamp (in seconds) of when the chat completion was created.
model
string
The model used for the chat completion. You can use the models
endpoint to retrieve the list of supported models.
choices
array
A list of chat completion choices.
Show properties
index
integer
The index of the choice in the list of returned choices.
message
object
A chat completion message generated by the model. Can be one of system
, user
, or assistant
.
Show properties
content
string or array
The contents of the message.
role
string or null
The role of the messages author. Can be one of system
, user
, or assistant
logprobs
boolean or null
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content
of message
. Defaults to false
.
finish_reason
string
The reason the model stopped generating tokens. This will be stop
if the model hit a natural stop point or a provided stop sequence, length
if the maximum number of tokens specified in the request was reached, content_filter
if content was omitted due to a flag from our content filters, tool_calls
if the model called a tool, or function_call
(deprecated) if the model called a function.
stop_reason
string or null
The reason the model stopped generating text.
usage
object
Usage statistics for the completion request.
Show properties
completion_tokens
integer
Number of tokens in the generated completion.
prompt_tokens
integer
Number of tokens in the prompt.
total_tokens
integer
Total number of tokens used in the request (prompt + completion).
{
"id": "chat-d187c103e189483485b3bcd3eb899c62",
"object": "chat.completion",
"created": 1736136422,
"model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Argentina is Buenos Aires.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 48,
"total_tokens": 57,
"completion_tokens": 9
},
"prompt_logprobs": null
}
Batch#
Submit a Batch job#
POST https://api.kluster.ai/v1/batches
To submit a Batch job, send a request to the batches
endpoint.
Request
input_file_id
string required
The ID of an uploaded file that contains requests for the new Batch.
Your input file must be formatted as a JSONL file, and must be uploaded with the purpose batch
. The file can contain up to 50,000 requests and currently a maximum of 6GB per file.
endpoint
string required
The endpoint to be used for all requests in the Batch. Currently, only /v1/chat/completions
is supported.
completion_window
string required
The supported completion windows are 1, 3, 6, 12, and 24 hours to accommodate a range of use cases and budget requirements. The code samples provided utilize the 24-hour completion window.
Learn more about how completion window selection affects cost by visiting the pricing section of the kluster.ai website.
metadata
Object or null
Custom metadata for the Batch.
Returns
The created Batch object.
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY", # Replace with your actual API key
)
batch_request = client.batches.create(
input_file_id="myfile-123",
endpoint="/v1/chat/completions",
completion_window="24h",
)
print(batch_request.to_dict())
curl -s https://api.kluster.ai/v1/batches \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "myfile-123",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'
{
"id": "mybatch-123",
"completion_window": "24h",
"created_at": 1733832777,
"endpoint": "/v1/chat/completions",
"input_file_id": "myfile-123",
"object": "batch",
"status": "validating",
"cancelled_at": null,
"cancelling_at": null,
"completed_at": null,
"error_file_id": null,
"errors": null,
"expired_at": null,
"expires_at": 1733919177,
"failed_at": null,
"finalizing_at": null,
"in_progress_at": null,
"metadata": {},
"output_file_id": null,
"request_counts": {
"completed": 0,
"failed": 0,
"total": 0
}
}
Retrieve a Batch#
GET https://api.kluster.ai/v1/batches/{batch_id}
To retrieve a Batch job, send a request to the batches
endpoint with your batch_id
.
You can also monitor jobs in the Batch tab of the kluster.ai platform UI.
Path parameters
batch_id
string required
The ID of the Batch to retrieve.
Returns
The Batch object matching the specified batch_id
.
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY", # Replace with your actual API key
)
client.batches.retrieve("mybatch-123")
curl -s https://api.kluster.ai/v1/batches/mybatch-123 \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json"
{
"id": "mybatch-123",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "myfile-123",
"completion_window": "24h",
"status": "completed",
"output_file_id": "myfile-123-output",
"error_file_id": null,
"created_at": "1733832777",
"in_progress_at": "1733832777",
"expires_at": "1733919177",
"finalizing_at": "1733832781",
"completed_at": "1733832781",
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 4,
"completed": 4,
"failed": 0
},
"metadata": {}
}
Cancel a Batch#
POST https://api.kluster.ai/v1/batches/{batch_id}/cancel
To cancel a Batch job that is currently in progress, send a request to the cancel
endpoint with your batch_id
. Note that cancellation may take up to 10 minutes to complete, during which time the status will show as cancelling
.
Path parameters
batch_id
string required
The ID of the Batch to cancel.
Returns
The Batch object matching the specified ID.
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY" # Replace with your actual API key
)
client.batches.cancel("mybatch-123") # Replace with your batch id
curl -s https://api.kluster.ai/v1/batches/$BATCH_ID/cancel \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-X POST
{
"id": "mybatch-123",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "myfile-123",
"completion_window": "24h",
"status": "cancelling",
"output_file_id": "myfile-123-output",
"error_file_id": null,
"created_at": "1730821906",
"in_progress_at": "1730821911",
"expires_at": "1730821906",
"finalizing_at": null,
"completed_at": null,
"failed_at": null,
"expired_at": null,
"cancelling_at": "1730821906",
"cancelled_at": null,
"request_counts": {
"total": 3,
"completed": 3,
"failed": 0
},
"metadata": {}
}
List all Batch jobs#
GET https://api.kluster.ai/v1/batches
To list all Batch jobs, send a request to the batches
endpoint without specifying a batch_id
. To constrain the query response, you can also use a limit
parameter.
Query parameters
after
string
A cursor for use in pagination. after
is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo
, your subsequent call can include after=obj_foo
in order to fetch the next page of the list.
limit
integer
A limit on the number of objects to be returned. Limit can range between 1 and 100. Default is 20.
Returns
A list of paginated Batch objects.
The status of a Batch object can be one of the following:
Status | Description |
---|---|
validating |
The input file is being validated. |
failed |
The input file failed the validation process. |
in_progress |
The input file was successfully validated and the Batch is in progress. |
finalizing |
The Batch job has completed and the results are being finalized. |
completed |
The Batch has completed and the results are ready. |
expired |
The Batch was not completed within the 24-hour time window. |
cancelling |
The Batch is being cancelled (may take up to 10 minutes). |
cancelled |
The Batch was cancelled. |
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY" # Replace with your actual API key
)
print(client.batches.list(limit=2).to_dict())
curl -s https://api.kluster.ai/v1/batches \
-H "Authorization: Bearer $API_KEY"
{
"object": "list",
"data": [
{
"id": "mybatch-123",
"object": "batch",
"endpoint": "/v1/chat/completions",
"errors": null,
"input_file_id": "myfile-123",
"completion_window": "24h",
"status": "completed",
"output_file_id": "myfile-123-output",
"error_file_id": null,
"created_at": "1733832777",
"in_progress_at": "1733832777",
"expires_at": "1733919177",
"finalizing_at": "1733832781",
"completed_at": "1733832781",
"failed_at": null,
"expired_at": null,
"cancelling_at": null,
"cancelled_at": null,
"request_counts": {
"total": 4,
"completed": 4,
"failed": 0
},
"metadata": {}
},
{ ... },
],
"first_id": "mybatch-123",
"last_id": "mybatch-789",
"has_more": false,
"count": 1,
"page": 1,
"page_count": -1,
"items_per_page": 9223372036854775807
}
Batch object#
id
string
The ID of the Batch.
object
string
The object type, which is always batch
.
endpoint
string
The kluster.ai API endpoint used by the Batch.
errors
object
Show properties
object
string
The object type, which is always list
.
data
array
Show properties
code
string
An error code identifying the error type.
message
string
A human-readable message providing more details about the error.
param
string or null
The name of the parameter that caused the error, if applicable.
line
integer or null
The line number of the input file where the error occurred, if applicable.
input_file_id
string
The ID of the input file for the Batch.
completion_window
string
The time frame within which the Batch should be processed.
status
string
The current status of the Batch.
output_file_id
string
The ID of the file containing the outputs of successfully executed requests.
error_file_id
string
The ID of the file containing the outputs of requests with errors.
created_at
integer
The Unix timestamp (in seconds) for when the Batch was created.
in_progress_at
integer
The Unix timestamp (in seconds) for when the Batch started processing.
expires_at
integer
The Unix timestamp (in seconds) for when the Batch will expire.
finalizing_at
integer
The Unix timestamp (in seconds) for when the Batch started finalizing.
completed_at
integer
The Unix timestamp (in seconds) for when the Batch was completed.
failed_at
integer
The Unix timestamp (in seconds) for when the Batch failed.
expired_at
integer
The Unix timestamp (in seconds) for when the Batch expired.
cancelling_at
integer
The Unix timestamp (in seconds) for when the Batch started cancelling.
cancelled_at
integer
The Unix timestamp (in seconds) for when the Batch was cancelled.
request_counts
object
The request counts for different statuses within the Batch.
Show properties
total
integer
Total number of requests in the Batch.
completed
integer
Number of requests that have been completed successfully.
failed
integer
Number of requests that have failed.
{
"id": "mybatch-123",
"completion_window": "24h",
"created_at": 1733832777,
"endpoint": "/v1/chat/completions",
"input_file_id": "myfile-123",
"object": "batch",
"status": "validating",
"cancelled_at": null,
"cancelling_at": null,
"completed_at": null,
"error_file_id": null,
"errors": null,
"expired_at": null,
"expires_at": 1733919177,
"failed_at": null,
"finalizing_at": null,
"in_progress_at": null,
"metadata": {},
"output_file_id": null,
"request_counts": {
"completed": 0,
"failed": 0,
"total": 0
}
}
The request input object#
The per-line object of the Batch input file.
custom_id
string
A developer-provided per-request ID.
method
string
The HTTP method to be used for the request. Currently, only POST is supported.
url
string
The /v1/chat/completions
endpoint.
body
map
The JSON body of the input file.
[
{
"custom_id": "request-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Argentina?"
}
],
"max_tokens": 1000
}
}
]
The request output object#
The per-line object of the Batch output files.
id
string
A unique identifier for the batch request.
custom_id
string
A developer-provided per-request ID that will be used to match outputs to inputs.
response
object or null
Show properties
status_code
integer
The HTTP status code of the response.
request_id
string
A unique identifier for the request. You can reference this request ID if you need to contact support for assistance.
body
map
The JSON body of the response.
error
object or null
For requests that failed with a non-HTTP error, this will contain more information on the cause of the failure.
Show properties
code
string
A machine-readable error code.
message
string
A human-readable error message.
{
"id": "batch-req-123",
"custom_id": "request-1",
"response": {
"status_code": 200,
"request_id": "req-123",
"body": {
"id": "chatcmpl-5a5ba6c6-2f95-4136-815b-23275c4f1efb",
"object": "chat.completion",
"created": 1737472126,
"model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Argentina is Buenos Aires.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 48,
"total_tokens": 57,
"completion_tokens": 9,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
}
}
Files#
Upload files#
POST https://api.kluster.ai/v1/files/
Upload a JSON Lines file to the files
endpoint.
You can also view all your uploaded files in the Files tab of the kluster.ai platform.
Request
file
file required
The File object (not file name) to be uploaded.
purpose
string required
The intended purpose of the uploaded file. Use batch
for the Batch API.
Returns
The uploaded File object.
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY" # Replace with your actual API key
)
batch_input_file = client.files.create(
file=open(file_name, "rb"),
purpose="batch"
)
print(batch_input_file.to_dict())
curl -s https://api.kluster.ai/v1/files \
-H "Authorization: $API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@mybatchtest.jsonl" \
-F "purpose=batch"
{
"id": "myfile-123",
"bytes": 2797,
"created_at": "1733832768",
"filename": "mybatchtest.jsonl",
"object": "file",
"purpose": "batch"
}
Retrieve file content#
GET https://api.kluster.ai/v1/files/{output_file_id}/content
To retrieve the content of your Batch jobs output file, send a request to the files
endpoint specifying the output_file_id
. The output file will be a JSONL file, where each line contains the custom_id
from your input file request, and the corresponding response.
Path parameters
file_id
string required
The ID of the file to use for this request
Returns
The file content. Refer to the input and output format specifications for batch requests.
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY" # Replace with your actual API key
)
# Get the status of the Batch, which returns the output_file_id
batch_status = client.batches.retrieve(batch_request.id)
# Check if the Batch completed successfully
if batch_status.status.lower() == "completed":
# Retrieve the results
result_file_id = batch_status.output_file_id
results = client.files.content(result_file_id).content
# Save results to a file
result_file_name = "batch_results.jsonl"
with open(result_file_name, "wb") as file:
file.write(results)
print(f"Results saved to {result_file_name}")
else:
print(f"Batch failed with status: {batch_status.status}")
curl -s https://api.kluster.ai/v1/files/kluster-output-file-123/content \
-H "Authorization: Bearer $API_KEY" > batch_output.jsonl
File object#
id
string
The file identifier, which can be referenced in the API endpoints.
object
string
The object type, which is always file
.
bytes
integer
The size of the file, in bytes.
created_at
integer
The Unix timestamp (in seconds) for when the file was created.
filename
string
The name of the file.
purpose
string
The intended purpose of the file. Currently, only batch
is supported.
{
"id": "myfile-123",
"bytes": 2797,
"created_at": "1733832768",
"filename": "mybatchtest.jsonl",
"object": "file",
"purpose": "batch"
}
Models#
List supported models#
GET https://api.kluster.ai/v1/models
Lists the currently available models.
You can use this endpoint to retrieve a list of all available models for the kluster.ai API. Currently supported models include:
klusterai/Meta-Llama-3.1-8B-Instruct-Turbo
klusterai/Meta-Llama-3.1-405B-Instruct-Turbo
klusterai/Meta-Llama-3.3-70B-Instruct-Turbo
deepseek-ai/DeepSeek-R1
Returns
id
string
The model identifier, which can be referenced in the API endpoints.
created
integer
The Unix timestamp (in seconds) when the model was created.
object
string
The object type, which is always model
.
owned_by
string
The organization that owns the model.
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="http://api.kluster.ai/v1",
api_key="INSERT_API_KEY" # Replace with your actual API key
)
print(client.models.list().to_dict())
curl https://api.kluster.ai/v1/models \
-H "Authorization: Bearer $API_KEY"
{
"object": "list",
"data": [
{
"id": "klusterai/Meta-Llama-3.1-405B-Instruct-Turbo",
"created": 1731336418,
"object": "model",
"owned_by": "klusterai"
},
{
"id": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
"created": 1731336610,
"object": "model",
"owned_by": "klusterai"
},
{
"id": "klusterai/Meta-Llama-3.3-70B-Instruct-Turbo",
"created": 1733777629,
"object": "model",
"owned_by": "klusterai"
},
{
"id": "deepseek-ai/DeepSeek-R1",
"created": 1737385699,
"object": "model",
"owned_by": "klusterai"
}
],
}
Fine-tuning#
Fine-tuning is the process of refining a pre-trained model on specialized data. By adjusting the parameters with new, domain-specific examples, the model performs better on targeted tasks while retaining the general knowledge learned in its original training.
Supported models#
Currently, two base models are supported for Fine-tuning:
klusterai/Meta-Llama-3.1-8B-Instruct-Turbo
- has a64,000
tokens max context window, best for long-context tasks, cost-sensitive scenariosklusterai/Meta-Llama-3.3-70B-Instruct-Turbo
- has a32,000
tokens max context window, best for complex reasoning, high-stakes accuracy
Create a Fine-tuning job#
POST https://api.kluster.ai/v1/fine_tuning/jobs
To initiate a Fine-tuning job for one of the supported models, first upload the dataset file (see Files section for instructions).
Request
training_file
string required
ID of an uploaded file that will serve as training data. This file must have purpose="fine-tune"
.
model
string required
The base model ID to fine-tune. Must be a fine-tunable model, for example meta-llama/Meta-Llama-3.1-8B-Instruct
or meta-llama/Meta-Llama-3.3-70B-Instruct-Turbo
.
validation_file
string or null
Optionally specify a separate file to serve as your validation dataset.
hyperparameters
object or null
Optionally specify an object containing hyperparameters for Fine-tuning:
Show properties
batch_size
number
The number of training examples processed in one forward/backward pass. Larger batch sizes reduce the frequency of weight updates per epoch, leading to more stable gradients but slower updates. Gradient accumulation is used, so larger batches may increase the duration of the job.
learning_rate_multiplier
number
A multiplier for the base step size used in model weight updates. Lower values slow training but improve precision (helping avoid overshooting optimal weights or overfitting). Higher values speed up convergence but risk instability. Adjust carefully to balance training efficiency and model performance.
n_epochs
number
The number of times the entire training dataset is passed through the model. More epochs can improve learning but risk overfitting if the model memorizes training data. Monitor validation metrics to determine the optimal number.
nickname
string or null
Add a custom suffix that will be appended to the output model name. This can help identify a fine tuned model.
Returns
from openai import OpenAI
# Configure OpenAI client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY" # Replace with your actual API key
)
job = client.fine_tuning.jobs.create(
training_file="INSERT_TRAINING_FILE_ID", # ID from uploaded training file
model="meta-llama/Meta-Llama-3.1-8B-Instruct",
hyperparameters={
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
}
)
print(job.to_dict())
curl -X POST https://api.kluster.ai/v1/fine_tuning/jobs \
-H "Authorization: Bearer INSERT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"training_file": "INSERT_TRAINING_FILE_ID",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"hyperparameters": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
}
}'
{
"object": "fine_tuning.job",
"id": "67ae81b59b08392687ea5f69",
"model": "meta-llama/Llama-3.1-8B-Instruct",
"created_at": 1739489717,
"result_files": [],
"status": "queued",
"training_file": "67ae81587772e8a89c8fd5cf",
"hyperparameters": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
},
"method": {
"type": "supervised",
"supervised": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
}
},
"integrations": []
}
Retrieve a Fine-tuning job#
GET https://api.kluster.ai/v1/fine_tuning/jobs/{fine_tuning_job_id}
Fetch details of a single Fine-tuning job by specifying its fine_tuning_job_id
.
Path parameters
fine_tuning_job_id
string required
The ID of the Fine-tuning job to retrieve.
Returns
from openai import OpenAI
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY"
)
job_details = client.fine_tuning.jobs.retrieve("INSERT_JOB_ID")
print(job_details.to_dict())
curl -s https://api.kluster.ai/v1/fine_tuning/jobs/INSERT_JOB_ID \
-H "Authorization: Bearer INSERT_API_KEY"
{
"object": "fine_tuning.job",
"id": "67ae81b59b08392687ea5f69",
"model": "meta-llama/Llama-3.1-8B-Instruct",
"created_at": 1739489717,
"result_files": [],
"status": "running",
"training_file": "67ae81587772e8a89c8fd5cf",
"hyperparameters": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
},
"method": {
"type": "supervised",
"supervised": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
}
},
"integrations": []
}
List all Fine-tuning jobs#
GET https://api.kluster.ai/v1/fine_tuning/jobs
Retrieve a paginated list of all Fine-tuning jobs.
Query parameters
after
string
A cursor for use in pagination.
limit
integer
A limit on the number of objects returned (1 to 100). Default is 20.
Returns
A paginated list of Fine-tuning job objects.
from openai import OpenAI
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY"
)
jobs = client.fine_tuning.jobs.list(limit=3)
print(jobs.to_dict())
curl -s https://api.kluster.ai/v1/fine_tuning/jobs \
-H "Authorization: Bearer $API_KEY"
{
"object": "list",
"data": [
{
"object": "fine_tuning.job",
"id": "67ae81b59b08392687ea5f69",
"model": "meta-llama/Llama-3.1-8B-Instruct",
"created_at": 1739489717,
"result_files": [],
"status": "running",
"training_file": "67ae81587772e8a89c8fd5cf",
"hyperparameters": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
},
"method": {
"type": "supervised",
"supervised": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
}
},
"integrations": []
},
{
"object": "fine_tuning.job",
"id": "67ae7f7d965c187d5cda039f",
"model": "meta-llama/Llama-3.1-8B-Instruct",
"created_at": 1739489149,
"result_files": [],
"status": "cancelled",
"training_file": "67ae7f7c965c187d5cda0397",
"hyperparameters": {
"batch_size": 1,
"learning_rate_multiplier": 1,
"n_epochs": 10
},
"method": {
"type": "supervised",
"supervised": {
"batch_size": 1,
"learning_rate_multiplier": 1,
"n_epochs": 10
}
},
"integrations": []
}
],
"first_id": "67ae81b59b08392687ea5f69",
"last_id": "67abefddbee1f22fb0a742ef",
"has_more": true
}
Cancel a Fine-tuning job#
POST https://api.kluster.ai/v1/fine_tuning/jobs/{fine_tuning_job_id}/cancel
To cancel a job that is in progress, send a POST
request to the cancel
endpoint with the job ID.
Path parameters
fine_tuning_job_id
string required
The ID of the Fine-tuning job to cancel.
Returns
The Fine-tuning job object with updated status.
from openai import OpenAI
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key="INSERT_API_KEY"
)
cancelled_job = client.fine_tuning.jobs.cancel("67ae7f7d965c187d5cda039f")
print(cancelled_job.to_dict())
curl -X POST https://api.kluster.ai/v1/fine_tuning/jobs/67ae7f7d965c187d5cda039f/cancel \
-H "Authorization: Bearer INSERT_API_KEY" \
-H "Content-Type: application/json"
{
"id": "67ae7f7d965c187d5cda039f",
"object": "fine_tuning.job",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"fine_tuned_model": null,
"status": "cancelling",
"created_at": 1738382911,
"training_file": "file-123abc",
"validation_file": null,
"hyperparameters": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 3
},
"metrics": {},
"error": null
}
Fine-tuning job object#
object
string
The object type, which is always fine_tuning.job
.
id
string
Unique identifier for the Fine-tuning job.
model
string
ID of the base model being fine-tuned.
created_at
integer
Unix timestamp (in seconds) when the Fine-tuning job was created.
finished_at
integer
Unix timestamp (in seconds) when the Fine-tuning job was completed.
fine_tuned_model
string or null
The ID of the resulting fine-tuned model if the job succeeded; otherwise null
.
result_files
array
Array of file IDs associated with the Fine-tuning job results.
status
string
The status of the Fine-tuning job, e.g. pending
, running
, succeeded
, failed
, or cancelled
.
training_file
string
ID of the uploaded file used for training data.
hyperparameters
object
Training hyperparameters used in the job (e.g., batch_size
, n_epochs
, learning_rate_multiplier
).
method
object
Details about the Fine-tuning method used, including type and specific parameters.
trained_tokens
integer
The total number of tokens processed during training.
integrations
array
Array of integrations associated with the Fine-tuning job.
{
"object": "fine_tuning.job",
"id": "67ad3877720af9f9ba78b684",
"model": "meta-llama/Llama-3.1-8B-Instruct",
"created_at": 1739405431,
"finished_at": 1739405521,
"fine_tuned_model": "ft:meta-llama:Llama-3.1-8B-Instruct:personal:805b5d69",
"result_files": [],
"status": "succeeded",
"training_file": "67ad38760272045e7006171b",
"hyperparameters": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 2
},
"method": {
"type": "supervised",
"supervised": {
"batch_size": 4,
"learning_rate_multiplier": 1,
"n_epochs": 2
}
},
"trained_tokens": 3065,
"integrations": []
}