Skip to content

API reference


Create chat completion


To create a chat completion, send a request to the chat/completions endpoint.


model string required

ID of the model to use. You can use the models endpoint to retrieve the list of supported models.

messages array required

A list of messages comprising the conversation so far. The messages object can be one of system, user, or assistant.

Show possible types

System message object

Show properties

content string or array

The contents of the system message.

role string or null required

The role of the messages author, in this case, system.

User message object

Show properties

content string or array

The contents of the user message.

role string or null required

The role of the messages author, in this case, user.

Assistant message object

Show properties

content string or array

The contents of the assistant message.

role string or null required

The role of the messages author, in this case, assistant.

frequency_penalty number or null

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood of repeating the same line verbatim. Defaults to 0.

logit_bias map

Modify the likelihood of specified tokens appearing in the completion. Defaults to null.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase the likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

logprobs boolean or null

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. Defaults to false.

top_logprobs integer or null

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

max_completion_tokens integer or null

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

presence_penalty number or null

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Defaults to 0.

seed integer or null

If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.

stop string or array or null

Up to four sequences where the API will stop generating further tokens. Defaults to null.

stream boolean or null

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Defaults to false.

temperature number or null

The sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 1.

It is generally recommended to alter this or top_p but not both.

top_p number or null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Defaults to 1.

It is generally recommended to alter this or temperature but not both.


The created Chat completion object.

Example request
from openai import OpenAI

# Configure OpenAI client
client = OpenAI(
    api_key="INSERT_API_KEY" # Replace with your actual API key

chat_completion =
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Argentina?"},

Example request
curl -s \
    -H "Authorization: Bearer 4532c187-d275-4a6b-940c-5d92f9b20ea6" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
        "messages": [
                "role": "system",
                "content": "You are a helpful assistant."
                "role": "user",
                "content": "What is the capital of Argentina?"
    "id": "chat-d187c103e189483485b3bcd3eb899c62",
    "object": "chat.completion",
    "created": 1736136422,
    "model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
    "choices": [
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The capital of Argentina is Buenos Aires.",
                "tool_calls": []
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
    "usage": {
        "prompt_tokens": 48,
        "total_tokens": 57,
        "completion_tokens": 9
    "prompt_logprobs": null

Chat completion object

id string

Unique identifier for the chat completion.

object string

The object type, which is always chat.completion.

created integer

The Unix timestamp (in seconds) of when the chat completion was created.

model string

The model used for the chat completion. You can use the models endpoint to retrieve the list of supported models.

choices array

A list of chat completion choices.

Show properties

index integer

The index of the choice in the list of returned choices.

message object

A chat completion message generated by the model. Can be one of system, user, or assistant.

Show properties

content string or array

The contents of the message.

role string or null

The role of the messages author. Can be one of system, user, or assistant

logprobs boolean or null

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. Defaults to false.

finish_reason string

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, content_filter if content was omitted due to a flag from our content filters, tool_calls if the model called a tool, or function_call (deprecated) if the model called a function.

stop_reason string or null

The reason the model stopped generating text.

usage object

Usage statistics for the completion request.

Show properties

completion_tokens integer

Number of tokens in the generated completion.

prompt_tokens integer

Number of tokens in the prompt.

total_tokens integer

Total number of tokens used in the request (prompt + completion).

Chat completion object
    "id": "chat-d187c103e189483485b3bcd3eb899c62",
    "object": "chat.completion",
    "created": 1736136422,
    "model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
    "choices": [
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The capital of Argentina is Buenos Aires.",
                "tool_calls": []
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
    "usage": {
        "prompt_tokens": 48,
        "total_tokens": 57,
        "completion_tokens": 9
    "prompt_logprobs": null


Submit a Batch job


To submit a Batch job, send a request to the batches endpoint.


input_file_id string required

The ID of an uploaded file that contains requests for the new Batch.

Your input file must be formatted as a JSONL file, and must be uploaded with the purpose batch. The file can contain up to 50,000 requests and currently a maximum of 6GB per file.

endpoint string required

The endpoint to be used for all requests in the Batch. Currently, only /v1/chat/completions is supported.

completion_window string required

The supported completion windows are 1, 3, 6, 12, and 24 hours to accommodate a range of use cases and budget requirements. The code samples provided utilize the 24-hour completion window.

Learn more about how completion window selection affects cost by visiting the pricing section of the website.

metadata Object or null

Custom metadata for the Batch.


The created Batch object.

Example request
from openai import OpenAI

# Configure OpenAI client
client = OpenAI(
    api_key="INSERT_API_KEY",  # Replace with your actual API key

batch_request = client.batches.create(

Example request
curl -s \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "input_file_id": "myfile-123",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
    "id": "mybatch-123",
    "completion_window": "24h",
    "created_at": 1733832777,
    "endpoint": "/v1/chat/completions",
    "input_file_id": "myfile-123",
    "object": "batch",
    "status": "validating",
    "cancelled_at": null,
    "cancelling_at": null,
    "completed_at": null,
    "error_file_id": null,
    "errors": null,
    "expired_at": null,
    "expires_at": 1733919177,
    "failed_at": null,
    "finalizing_at": null,
    "in_progress_at": null,
    "metadata": {},
    "output_file_id": null,
    "request_counts": {
        "completed": 0,
        "failed": 0,
        "total": 0

Retrieve a Batch


To retrieve a Batch job, send a request to the batches endpoint with your batch_id.

You can also monitor jobs in the Batch tab of the platform UI.

Path parameters

batch_id string required

The ID of the Batch to retrieve.


The Batch object matching the specified batch_id.

Example request
from openai import OpenAI

# Configure OpenAI client
client = OpenAI(
    api_key="INSERT_API_KEY",  # Replace with your actual API key

Example request
curl -s \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json"
  "id": "mybatch-123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "myfile-123",
  "completion_window": "24h",
  "status": "completed",
  "output_file_id": "myfile-123-output",
  "error_file_id": null,
  "created_at": "1733832777",
  "in_progress_at": "1733832777",
  "expires_at": "1733919177",
  "finalizing_at": "1733832781",
  "completed_at": "1733832781",
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 4,
    "completed": 4,
    "failed": 0
  "metadata": {}

Cancel a Batch


To cancel a Batch job that is currently in progress, send a request to the cancel endpoint with your batch_id. Note that cancellation may take up to 10 minutes to complete, during which time the status will show as cancelling.

Path parameters

batch_id string required

The ID of the Batch to cancel.


The Batch object matching the specified ID.

from openai import OpenAI

# Configure OpenAI client
client = OpenAI(
    api_key="INSERT_API_KEY" # Replace with your actual API key

client.batches.cancel("mybatch-123") # Replace with your batch id
curl -s$BATCH_ID/cancel \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -X POST
  "id": "mybatch-123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "myfile-123",
  "completion_window": "24h",
  "status": "cancelling",
  "output_file_id": "myfile-123-output",
  "error_file_id": null,
  "created_at": "1730821906",
  "in_progress_at": "1730821911",
  "expires_at": "1730821906",
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": "1730821906",
  "cancelled_at": null,
  "request_counts": {
    "total": 3,
    "completed": 3,
    "failed": 0
  "metadata": {}

List all Batch jobs


To list all Batch jobs, send a request to the batches endpoint without specifying a batch_id. To constrain the query response, you can also use a limit parameter.

Query parameters

after string

A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.

limit integer

A limit on the number of objects to be returned. Limit can range between 1 and 100. Default is 20.


A list of paginated Batch objects.

The status of a Batch object can be one of the following:

Status Description
validating The input file is being validated.
failed The input file failed the validation process.
in_progress The input file was successfully validated and the Batch is in progress.
finalizing The Batch job has completed and the results are being finalized.
completed The Batch has completed and the results are ready.
expired The Batch was not completed within the 24-hour time window.
cancelling The Batch is being cancelled (may take up to 10 minutes).
cancelled The Batch was cancelled.
Example request
from openai import OpenAI

# Configure OpenAI client
client = OpenAI(
    api_key="INSERT_API_KEY" # Replace with your actual API key

Example request
curl -s \
    -H "Authorization: Bearer $API_KEY"
"object": "list",
"data": [
    "id": "mybatch-123",
    "object": "batch",
    "endpoint": "/v1/chat/completions",
    "errors": null,
    "input_file_id": "myfile-123",
    "completion_window": "24h",
    "status": "completed",
    "output_file_id": "myfile-123-output",
    "error_file_id": null,
    "created_at": "1733832777",
    "in_progress_at": "1733832777",
    "expires_at": "1733919177",
    "finalizing_at": "1733832781",
    "completed_at": "1733832781",
    "failed_at": null,
    "expired_at": null,
    "cancelling_at": null,
    "cancelled_at": null,
    "request_counts": {
        "total": 4,
        "completed": 4,
        "failed": 0
    "metadata": {}
{ ... },
"first_id": "mybatch-123",
"last_id": "mybatch-789",
"has_more": false,
"count": 1,
"page": 1,
"page_count": -1,
"items_per_page": 9223372036854775807

Batch object

id string

The ID of the Batch.

object string

The object type, which is always batch.

endpoint string

The API endpoint used by the Batch.

errors object

Show properties

object string

The object type, which is always list.

data array

Show properties

code string

An error code identifying the error type.

message string

A human-readable message providing more details about the error.

param string or null

The name of the parameter that caused the error, if applicable.

line integer or null

The line number of the input file where the error occurred, if applicable.

input_file_id string

The ID of the input file for the Batch.

completion_window string

The time frame within which the Batch should be processed.

status string

The current status of the Batch.

output_file_id string

The ID of the file containing the outputs of successfully executed requests.

error_file_id string

The ID of the file containing the outputs of requests with errors.

created_at integer

The Unix timestamp (in seconds) for when the Batch was created.

in_progress_at integer

The Unix timestamp (in seconds) for when the Batch started processing.

expires_at integer

The Unix timestamp (in seconds) for when the Batch will expire.

finalizing_at integer

The Unix timestamp (in seconds) for when the Batch started finalizing.

completed_at integer

The Unix timestamp (in seconds) for when the Batch was completed.

failed_at integer

The Unix timestamp (in seconds) for when the Batch failed.

expired_at integer

The Unix timestamp (in seconds) for when the Batch expired.

cancelling_at integer

The Unix timestamp (in seconds) for when the Batch started cancelling.

cancelled_at integer

The Unix timestamp (in seconds) for when the Batch was cancelled.

request_counts object

The request counts for different statuses within the Batch.

Show properties

total integer

Total number of requests in the Batch.

completed integer

Number of requests that have been completed successfully.

failed integer

Number of requests that have failed.

Batch object
    "id": "mybatch-123",
    "completion_window": "24h",
    "created_at": 1733832777,
    "endpoint": "/v1/chat/completions",
    "input_file_id": "myfile-123",
    "object": "batch",
    "status": "validating",
    "cancelled_at": null,
    "cancelling_at": null,
    "completed_at": null,
    "error_file_id": null,
    "errors": null,
    "expired_at": null,
    "expires_at": 1733919177,
    "failed_at": null,
    "finalizing_at": null,
    "in_progress_at": null,
    "metadata": {},
    "output_file_id": null,
    "request_counts": {
        "completed": 0,
        "failed": 0,
        "total": 0

The request input object

The per-line object of the Batch input file.

custom_id string

A developer-provided per-request ID.

method string

The HTTP method to be used for the request. Currently, only POST is supported.

url string

The /v1/chat/completions endpoint.

body map

The JSON body of the input file.

Request input object
        "custom_id": "request-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
            "messages": [
                    "role": "system",
                    "content": "You are a helpful assistant."
                    "role": "user",
                    "content": "What is the capital of Argentina?"
            "max_tokens": 1000

The request output object

The per-line object of the Batch output files.

id string

A unique identifier for the batch request.

custom_id string

A developer-provided per-request ID that will be used to match outputs to inputs.

response object or null

Show properties

status_code integer

The HTTP status code of the response.

request_id string

A unique identifier for the request. You can reference this request ID if you need to contact support for assistance.

body map

The JSON body of the response.

error object or null

For requests that failed with a non-HTTP error, this will contain more information on the cause of the failure.

Show properties

code string

A machine-readable error code.

message string

A human-readable error message.

Request output object
    "id": "batch-req-123",
    "custom_id": "request-1",
    "response": {
        "status_code": 200,
        "request_id": "req-123",
        "body": {
            "id": "chatcmpl-5a5ba6c6-2f95-4136-815b-23275c4f1efb",
            "object": "chat.completion",
            "created": 1737472126,
            "model": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
            "choices": [
                    "index": 0,
                    "message": {
                        "role": "assistant",
                        "content": "The capital of Argentina is Buenos Aires.",
                        "tool_calls": []
                    "logprobs": null,
                    "finish_reason": "stop",
                    "stop_reason": null
            "usage": {
                "prompt_tokens": 48,
                "total_tokens": 57,
                "completion_tokens": 9,
                "prompt_tokens_details": null
            "prompt_logprobs": null


Upload files


Upload a JSON Lines file to the files endpoint.

You can also view all your uploaded files in the Files tab of the platform.


file file required

The File object (not file name) to be uploaded.

purpose string required

The intended purpose of the uploaded file. Use batch for the Batch API.


The uploaded File object.

Example request
from openai import OpenAI

# Configure OpenAI client
client = OpenAI(
    api_key="INSERT_API_KEY" # Replace with your actual API key

batch_input_file = client.files.create(
    file=open(file_name, "rb"),

Example request
curl -s \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@mybatchtest.jsonl" \
    -F "purpose=batch"
  "id": "myfile-123",
  "bytes": 2797,
  "created_at": "1733832768",
  "filename": "mybatchtest.jsonl",
  "object": "file",
  "purpose": "batch"

Retrieve file content


To retrieve the content of your Batch jobs output file, send a request to the files endpoint specifying the output_file_id. The output file will be a JSONL file, where each line contains the custom_id from your input file request, and the corresponding response.

Path parameters

file_id string required

The ID of the file to use for this request


The file content. Refer to the input and output format specifications for batch requests.

Example request
from openai import OpenAI

# Configure OpenAI client
client = OpenAI(
    api_key="INSERT_API_KEY" # Replace with your actual API key

# Get the status of the Batch, which returns the output_file_id
batch_status = client.batches.retrieve(

# Check if the Batch completed successfully
if batch_status.status.lower() == "completed":
    # Retrieve the results
    result_file_id = batch_status.output_file_id
    results = client.files.content(result_file_id).content

    # Save results to a file
    result_file_name = "batch_results.jsonl"
    with open(result_file_name, "wb") as file:
    print(f"Results saved to {result_file_name}")
    print(f"Batch failed with status: {batch_status.status}")
Example request
curl -s \
    -H "Authorization: Bearer $API_KEY" > batch_output.jsonl

File object

id string

The file identifier, which can be referenced in the API endpoints.

object string

The object type, which is always file.

bytes integer

The size of the file, in bytes.

created_at integer

The Unix timestamp (in seconds) for when the file was created.

filename string

The name of the file.

purpose string

The intended purpose of the file. Currently, only batch is supported.

File object
  "id": "myfile-123",
  "bytes": 2797,
  "created_at": "1733832768",
  "filename": "mybatchtest.jsonl",
  "object": "file",
  "purpose": "batch"


List supported models


Lists the currently available models.

You can use this endpoint to retrieve a list of all available models for the API. Currently supported models include:

  • klusterai/Meta-Llama-3.1-8B-Instruct-Turbo
  • klusterai/Meta-Llama-3.1-405B-Instruct-Turbo
  • klusterai/Meta-Llama-3.3-70B-Instruct-Turbo
  • deepseek-ai/DeepSeek-R1


id string

The model identifier, which can be referenced in the API endpoints.

created integer

The Unix timestamp (in seconds) when the model was created.

object string

The object type, which is always model.

owned_by string

The organization that owns the model.

Example request
from openai import OpenAI

# Configure OpenAI client
client = OpenAI(
    api_key="INSERT_API_KEY" # Replace with your actual API key

Example request
curl \
    -H "Authorization: Bearer $API_KEY" 
  "object": "list",
  "data": [
      "id": "klusterai/Meta-Llama-3.1-405B-Instruct-Turbo",
      "created": 1731336418,
      "object": "model",
      "owned_by": "klusterai"
      "id": "klusterai/Meta-Llama-3.1-8B-Instruct-Turbo",
      "created": 1731336610,
      "object": "model",
      "owned_by": "klusterai"
      "id": "klusterai/Meta-Llama-3.3-70B-Instruct-Turbo",
      "created": 1733777629,
      "object": "model",
      "owned_by": "klusterai"
      "id": "deepseek-ai/DeepSeek-R1",
      "created": 1737385699,
      "object": "model",
      "owned_by": "klusterai"