Multiple inference requests with kluster.ai¶

In other notebooks, we used AI models to perform simple tasks like text classification, sentiment analysis and keyword extraction.

This tutorial runs through a notebook where you'll learn how to use the kluster.ai batch API to combine different tasks into a single batch file. Note that each task in the JSONL file can have its own model, system prompt, and particular request.

You can adapt this example by using your data and categories relevant to your use case. With this approach, you can effortlessly process datasets of any scale, big or small, and obtain categorized results powered by a state-of-the-art language model.

Prerequisites¶

Before getting started, ensure you have the following:

A kluster.ai account - sign up on the kluster.ai platform if you don't have one
A kluster.ai API key - after signing in, go to the API Keys section and create a new key. For detailed instructions, check out the Get an API key guide

Setup¶

In this notebook, we'll use Python's getpass module to safely input the key. After execution, please provide your unique kluster.ai API key (ensure no spaces).

In [1]:

Copied!

from getpass import getpass

api_key = getpass("Enter your kluster.ai API key: ")
from getpass import getpass

api_key = getpass("Enter your kluster.ai API key: ")

Next, ensure you've installed OpenAI Python library:

In [2]:

Copied!

%pip install -q openai
%pip install -q openai

Note: you may need to restart the kernel to use updated packages.

With the OpenAI Python library installed, we import the necessary dependencies for the tutorial:

In [3]:

Copied!





from openai import OpenAI

import pandas as pd
import time
import json
import os
import urllib.request
import requests
from IPython.display import clear_output, display

pd.set_option('display.max_columns', 1000, 'display.width', 1000, 'display.max_rows',1000, 'display.max_colwidth', 500)
from openai import OpenAI

import pandas as pd
import time
import json
import os
import urllib.request
import requests
from IPython.display import clear_output, display

pd.set_option('display.max_columns', 1000, 'display.width', 1000, 'display.max_rows',1000, 'display.max_colwidth', 500)

And then, initialize the client by pointing it to the kluster.ai endpoint, and passing your API key.

In [4]:

Copied!





# Set up the client
client = OpenAI(
    base_url="https://api.kluster.ai/v1",
    api_key=api_key,
)
# Set up the client
client = OpenAI(
    base_url="https://api.kluster.ai/v1",
    api_key=api_key,
)

Get the data¶

Now that you've initialized an OpenAI-compatible client pointing to kluster.ai, we can discuss the data.

This notebook includes three sample datasets: Amazon musical instruments reviews, Top 1000 IMDb Movies, and AG News sample.

The following code fetches the data and the last 5 data points of a single data sample. Feel free to change this or bring your own dataset.

In [5]:

Copied!





# Datasets
#1. Amazon musical instruments reviews sample dataset
#url = "https://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Musical_Instruments_5.json.gz"
#2. IMDB top 1000 sample dataset
#url = "https://raw.githubusercontent.com/kluster-ai/klusterai-cookbook/refs/heads/main/data/imdb_top_1000.csv" 
#3. AG News sample dataset
url = "https://raw.githubusercontent.com/kluster-ai/klusterai-cookbook/refs/heads/main/data/ag_news.csv"
# Datasets
#1. Amazon musical instruments reviews sample dataset
#url = "https://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Musical_Instruments_5.json.gz"
#2. IMDB top 1000 sample dataset
#url = "https://raw.githubusercontent.com/kluster-ai/klusterai-cookbook/refs/heads/main/data/imdb_top_1000.csv" 
#3. AG News sample dataset
url = "https://raw.githubusercontent.com/kluster-ai/klusterai-cookbook/refs/heads/main/data/ag_news.csv"

In [6]:

Copied!





def fetch_dataset(url, file_path=None):
  
  # Set the default file path based on the URL if none is provided
  if not file_path:
    file_path = os.path.join("data", os.path.basename(url))

  # Create the directory if it does not exist
  os.makedirs(os.path.dirname(file_path), exist_ok=True)

  # Download the file if it doesn't already exist
  if not os.path.exists(file_path):
    urllib.request.urlretrieve(url, file_path)
    print(f"Dataset downloaded and saved as {file_path}")
  else:
    print(f"Using cached file at {file_path}")

  # Load and process the dataset based on URL content
  if "imdb_top_1000.csv" in url:
    df = pd.read_csv(file_path)
    df['text'] = df['Series_Title'].astype(str) + ": " + df['Overview'].astype(str)
    df = df[['text']]
  elif "ag_news" in url:
    df = pd.read_csv(file_path, header=None, names=["label", "title", "description"])
    df['text'] = df['title'].astype(str) + ": " + df['description'].astype(str)
    df = df[['text']]
  elif "reviews_Musical_Instruments_5.json.gz" in url:
    df = pd.read_json(file_path, compression='gzip', lines=True)
    df.rename(columns={'reviewText': 'text'}, inplace=True)
    df = df[['text']]
  else:
    raise ValueError("URL does not match any known dataset format.")

  return df[['text']].tail(3).reset_index(drop=True) # Return last 3 entries resetting the index

# Fetch dataset
df = fetch_dataset(url=url, file_path=None)
df.head()
def fetch_dataset(url, file_path=None):
  
  # Set the default file path based on the URL if none is provided
  if not file_path:
    file_path = os.path.join("data", os.path.basename(url))

  # Create the directory if it does not exist
  os.makedirs(os.path.dirname(file_path), exist_ok=True)

  # Download the file if it doesn't already exist
  if not os.path.exists(file_path):
    urllib.request.urlretrieve(url, file_path)
    print(f"Dataset downloaded and saved as {file_path}")
  else:
    print(f"Using cached file at {file_path}")

  # Load and process the dataset based on URL content
  if "imdb_top_1000.csv" in url:
    df = pd.read_csv(file_path)
    df['text'] = df['Series_Title'].astype(str) + ": " + df['Overview'].astype(str)
    df = df[['text']]
  elif "ag_news" in url:
    df = pd.read_csv(file_path, header=None, names=["label", "title", "description"])
    df['text'] = df['title'].astype(str) + ": " + df['description'].astype(str)
    df = df[['text']]
  elif "reviews_Musical_Instruments_5.json.gz" in url:
    df = pd.read_json(file_path, compression='gzip', lines=True)
    df.rename(columns={'reviewText': 'text'}, inplace=True)
    df = df[['text']]
  else:
    raise ValueError("URL does not match any known dataset format.")

  return df[['text']].tail(3).reset_index(drop=True) # Return last 3 entries resetting the index

# Fetch dataset
df = fetch_dataset(url=url, file_path=None)
df.head()

Using cached file at data/ag_news.csv

Out[6]:

	text
0	Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan
1	New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan
2	Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan

Now that we've fetched and saved the dataset let's move to the batch inference flow.

Define the requests¶

For this particular tutorial, we predefined five requests for the model to execute based on common customer use cases:

Sentiment analysis - reviewing text to determine whether there is positive, neutral, or negative notation to the statement
Translation - translate the text to any other language, in this example, Spanish
Summarization - express the text in a concise form
Topic classification - classify the text between a given set of categories
Keyword extraction - provide a number of keywords

Requests are defined as a system prompt. This example runs through different types of requests, so they are defined as JSON objects. For each use case, we also defined the structure of the response we expect from the model.

If you’re happy with these requests and structure, you can simply run the code as-is. However, if you’d like to customize them, please modify the prompts (or add new ones) to make personal requests.

In [7]:

Copied!





SYSTEM_PROMPTS = {
  'sentiment': '''
  Analyze the sentiment of the given text. Provide only a JSON object with the following structure:
  {
    "sentiment": string, // "positive", "negative", or "neutral"
    "confidence": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis
  }
  ''',

  'translation': '''
  Translate the given text from English to Spanish, paraphrase, rewrite or perform cultural adaptations for the text to make sense in Spanish. Provide only a JSON object with the following structure:
  {
    "translation": string, // The Spanish translation
    "notes": string // Any notes about the translation, such as cultural adaptations or challenging phrases (max 500 words). Write this mainly in English.
  }
  ''',

  'summary': '''
  Summarize the main points of the given text. Provide only a JSON object with the following structure:
  {
    "summary": string, // A concise summary of the text (max 100 words)
  }
  ''',

  'topic_classification': '''
  Classify the main topic of the given text based on the following categories: "politics", "sports", "technology", "science", "business", "entertainment", "health", "other". Provide only a JSON object with the following structure:
  {
    "category": string, // The primary category of the provided text
    "confidence": float, // A value between 0 and 1 indicating confidence in the classification
  }
  ''',

  'keyword_extraction': '''
  Extract relevant keywords from the given text. Provide only a JSON object with the following structure:
  {
    "keywords": string[], // An array of up to 5 keywords that best represent the text content
    "context": string // Briefly explain how each keyword is relevant to the text (max 200 words)
  }
  '''
}
SYSTEM_PROMPTS = {
  'sentiment': '''
  Analyze the sentiment of the given text. Provide only a JSON object with the following structure:
  {
    "sentiment": string, // "positive", "negative", or "neutral"
    "confidence": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis
  }
  ''',

  'translation': '''
  Translate the given text from English to Spanish, paraphrase, rewrite or perform cultural adaptations for the text to make sense in Spanish. Provide only a JSON object with the following structure:
  {
    "translation": string, // The Spanish translation
    "notes": string // Any notes about the translation, such as cultural adaptations or challenging phrases (max 500 words). Write this mainly in English.
  }
  ''',

  'summary': '''
  Summarize the main points of the given text. Provide only a JSON object with the following structure:
  {
    "summary": string, // A concise summary of the text (max 100 words)
  }
  ''',

  'topic_classification': '''
  Classify the main topic of the given text based on the following categories: "politics", "sports", "technology", "science", "business", "entertainment", "health", "other". Provide only a JSON object with the following structure:
  {
    "category": string, // The primary category of the provided text
    "confidence": float, // A value between 0 and 1 indicating confidence in the classification
  }
  ''',

  'keyword_extraction': '''
  Extract relevant keywords from the given text. Provide only a JSON object with the following structure:
  {
    "keywords": string[], // An array of up to 5 keywords that best represent the text content
    "context": string // Briefly explain how each keyword is relevant to the text (max 200 words)
  }
  '''
}

Create the batch job file¶

This example uses the deepseek-ai/DeepSeek-V3-0324 model. If you'd like to use a different model, feel free to change it by modifying the model field.

Please refer to the Supported models section for a list of the models we support.

The following snippets prepare the JSONL file, where each line represents a different request. Note that each separate batch request can have its own model. Also, we are using a temperature of 0.5 but feel free to change it and play around with the different outcomes.

In [8]:

Copied!





# Models
# model="deepseek-ai/DeepSeek-R1"
model = "deepseek-ai/DeepSeek-V3-0324"
# model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo"
# model="klusterai/Meta-Llama-3.3-70B-Instruct-Turbo"
# model="Qwen/Qwen2.5-VL-7B-Instruct"


def create_batch_file(df, inference_type, system_prompt):
    batch_list = []
    for index, row in df.iterrows():
        content = row["text"]

        # Build the request for a given model, prompt, and data
        request = {
            "custom_id": f"{inference_type}-{index}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": model,
                "temperature": 0.5,
                "messages": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": content},
                ],
            },
        }
        batch_list.append(request)
    return batch_list

# Save file as JSON lines
def save_batch_file(batch_list, inference_type):
    filename = f"data/batch_request_{inference_type}.jsonl"
    with open(filename, "w") as file:
        for request in batch_list:
            file.write(json.dumps(request) + "\n")
    return filename
# Models
# model="deepseek-ai/DeepSeek-R1"
model = "deepseek-ai/DeepSeek-V3-0324"
# model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo"
# model="klusterai/Meta-Llama-3.3-70B-Instruct-Turbo"
# model="Qwen/Qwen2.5-VL-7B-Instruct"


def create_batch_file(df, inference_type, system_prompt):
    batch_list = []
    for index, row in df.iterrows():
        content = row["text"]

        # Build the request for a given model, prompt, and data
        request = {
            "custom_id": f"{inference_type}-{index}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": model,
                "temperature": 0.5,
                "messages": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": content},
                ],
            },
        }
        batch_list.append(request)
    return batch_list

# Save file as JSON lines
def save_batch_file(batch_list, inference_type):
    filename = f"data/batch_request_{inference_type}.jsonl"
    with open(filename, "w") as file:
        for request in batch_list:
            file.write(json.dumps(request) + "\n")
    return filename

In [9]:

Copied!





batch_requests = []
filenames = []

# Loop through all the different prompts
for inference_type, system_prompt in SYSTEM_PROMPTS.items():
    batch_list = create_batch_file(df, inference_type, system_prompt)
    filename = save_batch_file(batch_list, inference_type)
    batch_requests.append((inference_type, filename))
    filenames.append(filename)
    print(filename)
batch_requests = []
filenames = []

# Loop through all the different prompts
for inference_type, system_prompt in SYSTEM_PROMPTS.items():
    batch_list = create_batch_file(df, inference_type, system_prompt)
    filename = save_batch_file(batch_list, inference_type)
    batch_requests.append((inference_type, filename))
    filenames.append(filename)
    print(filename)

data/batch_request_sentiment.jsonl
data/batch_request_translation.jsonl
data/batch_request_summary.jsonl
data/batch_request_topic_classification.jsonl
data/batch_request_keyword_extraction.jsonl

Next, we can preview what a single batch job looks like:

In [10]:

Copied!

!head -n 5 data/batch_request_sentiment.jsonl
!head -n 5 data/batch_request_sentiment.jsonl

{"custom_id": "sentiment-0", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3-0324", "temperature": 0.5, "messages": [{"role": "system", "content": "\n  Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n  {\n    \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n    \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n  }\n  "}, {"role": "user", "content": "Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan"}]}}
{"custom_id": "sentiment-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3-0324", "temperature": 0.5, "messages": [{"role": "system", "content": "\n  Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n  {\n    \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n    \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n  }\n  "}, {"role": "user", "content": "New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan"}]}}
{"custom_id": "sentiment-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3-0324", "temperature": 0.5, "messages": [{"role": "system", "content": "\n  Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n  {\n    \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n    \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n  }\n  "}, {"role": "user", "content": "Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings  #151; including what appear to be new species of fish and squid  #151; could be used to protect marine ecosystems worldwide.: nan"}]}}

Upload batch job files to kluster.ai¶

Now that we've prepared our input files, it's time to upload it to the kluster.ai platform. To do so, you can use the files.create endpoint of the client, where the purpose is set to batch. This will return the file ID, which we need to log for the next steps. We will repeat the process for each batch file created.

In [11]:

Copied!





def upload_batch_file(data_dir):
  print(f"Creating request for {data_dir}")
  
  with open(data_dir, 'rb') as file:
    upload_response = client.files.create(
    file=file,
    purpose="batch"
  )

  # Print job ID
  file_id = upload_response.id
  print(f"File uploaded successfully. File ID: {file_id}")

  return upload_response
def upload_batch_file(data_dir):
  print(f"Creating request for {data_dir}")
  
  with open(data_dir, 'rb') as file:
    upload_response = client.files.create(
    file=file,
    purpose="batch"
  )

  # Print job ID
  file_id = upload_response.id
  print(f"File uploaded successfully. File ID: {file_id}")

  return upload_response

In [12]:

Copied!





batch_files = []

# Loop through all .jsonl files in the data folder
for data_dir in filenames:
    print(f"Uploading file {data_dir}")
    job = upload_batch_file(data_dir)
    batch_files.append(job)
batch_files = []

# Loop through all .jsonl files in the data folder
for data_dir in filenames:
    print(f"Uploading file {data_dir}")
    job = upload_batch_file(data_dir)
    batch_files.append(job)

Uploading file data/batch_request_sentiment.jsonl
Creating request for data/batch_request_sentiment.jsonl
File uploaded successfully. File ID: 6801436295e6d3f11461e80c
Uploading file data/batch_request_translation.jsonl
Creating request for data/batch_request_translation.jsonl
File uploaded successfully. File ID: 68014362d82e57647b763060
Uploading file data/batch_request_summary.jsonl
Creating request for data/batch_request_summary.jsonl
File uploaded successfully. File ID: 680143635f3f96e4d1ddda37
Uploading file data/batch_request_topic_classification.jsonl
Creating request for data/batch_request_topic_classification.jsonl
File uploaded successfully. File ID: 68014363186ab8d64b39630d
Uploading file data/batch_request_keyword_extraction.jsonl
Creating request for data/batch_request_keyword_extraction.jsonl
File uploaded successfully. File ID: 680143638225fce1bc87b575

All files are now uploaded, and we can proceed with creating the batch jobs.

Start the batch job¶

Once all the files have been successfully uploaded, we're ready to start (create) the batch jobs by providing the file ID of each file, which we got in the previous step. To start each job, we use the batches.create method, for which we need to set the endpoint to /v1/chat/completions. This will return each batch job details, with each ID.

In [13]:

Copied!





# Create batch job with completions endpoint
def create_batch_job(file_id):
  batch_job = client.batches.create(
    input_file_id=file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
  )

  print(f"Batch job created with ID {batch_job.id}")
  return batch_job
# Create batch job with completions endpoint
def create_batch_job(file_id):
  batch_job = client.batches.create(
    input_file_id=file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
  )

  print(f"Batch job created with ID {batch_job.id}")
  return batch_job

In [14]:

Copied!





batch_jobs = []

# Loop through all batch files ID and start each job
for batch_file in batch_files:
    print(f"Creating batch job for file ID {batch_file.id}")
    batch_job = create_batch_job(batch_file.id)
    batch_jobs.append(batch_job)
batch_jobs = []

# Loop through all batch files ID and start each job
for batch_file in batch_files:
    print(f"Creating batch job for file ID {batch_file.id}")
    batch_job = create_batch_job(batch_file.id)
    batch_jobs.append(batch_job)

Creating batch job for file ID 6801436295e6d3f11461e80c
Batch job created with ID 680143645f3f96e4d1ddda3f
Creating batch job for file ID 68014362d82e57647b763060
Batch job created with ID 68014364fba4aabfd069c948
Creating batch job for file ID 680143635f3f96e4d1ddda37
Batch job created with ID 6801436477b0762a42e196e6
Creating batch job for file ID 68014363186ab8d64b39630d
Batch job created with ID 680143658cc246574d8aa01d
Creating batch job for file ID 680143638225fce1bc87b575
Batch job created with ID 680143658225fce1bc87b58d

All requests are currently being processed.

Check job progress¶

Now that your batch jobs have been created, you can track their progress.

To monitor the job's progress, we can use the batches.retrieve method and pass the batch job ID. The response contains a status field that tells us if it is completed or not and the subsequent status of each job separately. We can repeat this process for every batch job ID we got in the previous step.

The following snippet checks the status of all batch jobs every 10 seconds until the entire batch is completed.

In [15]:

Copied!





def monitor_batch_jobs(batch_jobs):
    all_completed = False

    # Loop until all jobs are completed
    while not all_completed:
        all_completed = True
        output_lines = []

        # Loop through all batch jobs
        for job in batch_jobs:
            updated_job = client.batches.retrieve(job.id)
            status = updated_job.status

            # If job is completed
            if status == "completed":
                output_lines.append("Job completed!")
            # If job failed, cancelled or expired
            elif status in ["failed", "cancelled", "expired"]:
                output_lines.append(f"Job ended with status: {status}")
                break
            # If job is ongoing
            else:
                all_completed = False
                completed = updated_job.request_counts.completed
                total = updated_job.request_counts.total
                output_lines.append(
                    f"Job status: {status} - Progress: {completed}/{total}"
                )

        # Clear terminal
        clear_output(wait=True)
        for line in output_lines:
            display(line)

        # Check every 10 seconds
        if not all_completed:
            time.sleep(10)
def monitor_batch_jobs(batch_jobs):
    all_completed = False

    # Loop until all jobs are completed
    while not all_completed:
        all_completed = True
        output_lines = []

        # Loop through all batch jobs
        for job in batch_jobs:
            updated_job = client.batches.retrieve(job.id)
            status = updated_job.status

            # If job is completed
            if status == "completed":
                output_lines.append("Job completed!")
            # If job failed, cancelled or expired
            elif status in ["failed", "cancelled", "expired"]:
                output_lines.append(f"Job ended with status: {status}")
                break
            # If job is ongoing
            else:
                all_completed = False
                completed = updated_job.request_counts.completed
                total = updated_job.request_counts.total
                output_lines.append(
                    f"Job status: {status} - Progress: {completed}/{total}"
                )

        # Clear terminal
        clear_output(wait=True)
        for line in output_lines:
            display(line)

        # Check every 10 seconds
        if not all_completed:
            time.sleep(10)

In [16]:

Copied!

monitor_batch_jobs(batch_jobs)
monitor_batch_jobs(batch_jobs)

'Job completed!'

'Job completed!'

'Job completed!'

'Job completed!'

'Job completed!'

Get the results¶

With all jobs completed, we'll retrieve the results and review the responses generated for each request. The results are parsed. To fetch the results from the platform, you need to retrieve the output_file_id from the batch job, and then use the files.content endpoint, providing that specific file ID. We will repeat this for every single batch job id. Note that the job status must be completed for you to retrieve the results!

In [17]:

Copied!





#Parse results as a JSON object
def parse_json_objects(data_string):
  if isinstance(data_string, bytes):
    data_string = data_string.decode('utf-8')

  json_strings = data_string.strip().split('\n')
  json_objects = []

  for json_str in json_strings:
    try:
      json_obj = json.loads(json_str)
      json_objects.append(json_obj)
    except json.JSONDecodeError as e:
      print(f"Error parsing JSON: {e}")

  return json_objects
#Parse results as a JSON object
def parse_json_objects(data_string):
  if isinstance(data_string, bytes):
    data_string = data_string.decode('utf-8')

  json_strings = data_string.strip().split('\n')
  json_objects = []

  for json_str in json_strings:
    try:
      json_obj = json.loads(json_str)
      json_objects.append(json_obj)
    except json.JSONDecodeError as e:
      print(f"Error parsing JSON: {e}")

  return json_objects

In [18]:

Copied!





# Go through all batch jobs, providing the output file ID
for batch_job in batch_jobs:
  job_status = client.batches.retrieve(batch_job.id)
  result_file_id = job_status.output_file_id
  result = client.files.content(result_file_id).content
  results = parse_json_objects(result)

    # For each, print the result
  for res in results:
    inference_id = res['custom_id']
    index = inference_id.split('-')[-1]
    result = res['response']['body']['choices'][0]['message']['content']
    text = df.iloc[int(index)]['text']
    print(f'\n -------------------------- \n')
    print(f"Inference ID: {inference_id}. \n\nTEXT: {text}\n\nRESULT: {result}")
# Go through all batch jobs, providing the output file ID
for batch_job in batch_jobs:
  job_status = client.batches.retrieve(batch_job.id)
  result_file_id = job_status.output_file_id
  result = client.files.content(result_file_id).content
  results = parse_json_objects(result)

    # For each, print the result
  for res in results:
    inference_id = res['custom_id']
    index = inference_id.split('-')[-1]
    result = res['response']['body']['choices'][0]['message']['content']
    text = df.iloc[int(index)]['text']
    print(f'\n -------------------------- \n')
    print(f"Inference ID: {inference_id}. \n\nTEXT: {text}\n\nRESULT: {result}")

--------------------------

Inference ID: sentiment-0.

TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan

RESULT: ```json
{
"sentiment": "negative",
"confidence": 0.75
}
```

--------------------------

Inference ID: sentiment-1.

TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan

RESULT: ```json
{
"sentiment": "neutral",
"confidence": 0.85
}
```

--------------------------

Inference ID: sentiment-2.

TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan

RESULT: ```json
{
"sentiment": "positive",
"confidence": 0.9
}
```

--------------------------

Inference ID: translation-0.

TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan

RESULT: The article discusses allegations that the U.S. Forest Service exaggerated the impact of wildfires on California spotted owls in order to justify increasing logging in the Sierra Nevada. A longtime agency expert who worked on the plan is cited as making these claims. The article suggests that the Forest Service may have misrepresented data to support their logging plans.

--------------------------

Inference ID: translation-1.

TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan

RESULT: The article is about a new method that could predict earthquakes weeks in advance by monitoring the presence of metals like zinc and copper in subsoil water near earthquake sites. The method was developed by Swedish geologists and was announced on Wednesday.

--------------------------

Inference ID: translation-2.

TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan

RESULT: # Protecting Marine Ecosystems Worldwide

## Key Strategies for Protecting Marine Ecosystems

### 1. **Marine Protected Areas (MPAs)**
- Establish and expand MPAs to safeguard marine biodiversity.
- Limit human activities like fishing and drilling in these areas.

### 2. **Sustainable Fishing Practices**
- Implement sustainable fishing quotas and gear restrictions.
- Promote selective fishing to avoid overfishing and bycatch.

### 3. **Pollution Control**
- Reduce plastic waste and marine debris through waste management.
- Limit chemical discharges from industries and agriculture.

### 4. **Climate Action**
- Reduce greenhouse gas emissions to mitigate ocean acidification.
- Protect coastal areas from rising sea levels and storms.

### 5. **Public Awareness and Education**
- Educate communities on marine conservation.
- Promote eco-friendly tourism and fishing practices.

### 6. **International Cooperation**
- Collaborate globally for marine conservation efforts.
- Enforce international agreements like the UNCLOS.

## Conclusion
Protecting marine ecosystems requires global cooperation, sustainable practices, and public awareness to ensure ocean health for future generations.

--------------------------

Inference ID: summary-0.

TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan

RESULT: The Forest Service's plan to increase logging in the Sierra Nevada has been criticized for exaggerating the effect of wildfires on California spotted owls. A longtime agency expert who worked on the plan has stated that the agency overstated the impact of wildfires on the owls to justify the logging increase. This suggests that the Forest Service may have misrepresented the situation to support its logging plans.

--------------------------

Inference ID: summary-1.

TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan

RESULT: ### Understanding the Science Behind Predicting Earthquakes Using Subsoil Water Chemistry

#### **1. What is the scientific basis behind this method?**
The study suggests that earthquakes might be preceded by changes in the levels of metals like zinc and copper in subsoil water. The underlying idea is that tectonic stress and rock deformation before an earthquake could release these metals from the earth’s crust into groundwater. By monitoring these changes, scientists might be able to predict earthquakes weeks in advance.

#### **2. How does tectonic stress lead to the release of metals?**
Tectonic stress causes microfractures in rocks. As these fractures form, they can release minerals and metals that were trapped within the rocks. These metals then dissolve into groundwater, leading to measurable changes in water chemistry.

#### **3. What are the challenges in this method?**
- **Variability:** Earthquakes vary in size, depth, and location. The method would need to account for these differences to be universally applicable.
- **Monitoring:** Continuous monitoring of subsoil water is required, which can be logistically and financially challenging.
- **False positives:** Other geological processes (like volcanic activity) might also release metals into groundwater, leading to false earthquake predictions.

#### **4. How does this compare to other earthquake prediction methods?**
- **Seismic activity:** Traditional methods rely on monitoring seismic waves, but they often only provide seconds or minutes of warning.
- **Animal behavior:** Some studies suggest animals might sense earthquakes before they happen, but this is not scientifically proven.
- **Radon gas:** Some research suggests that radon gas levels might change before earthquakes, but this is also not consistently reliable.

This method is unique because it suggests a measurable change in water chemistry weeks before an earthquake, offering a longer warning period.

#### **5. What are the implications if this method is confirmed?**
- **Early warning:** If scientists can reliably predict earthquakes weeks in advance, it could save lives and reduce damage by allowing for better preparation.
- **Economic benefits:** Communities could take measures to strengthen infrastructure, evacuate areas, or prepare emergency responses.
- **Scientific advancement:** It would represent a significant breakthrough in understanding earthquake precursors and tectonic activity.

#### **6. What are the next steps in this research?**
- **Field studies:** More studies are needed to confirm the relationship between metal levels and earthquakes.
- **Technology development:** Better monitoring tools might be needed to detect subtle changes in water chemistry.
- **Public policy:** If proven reliable, governments might need to invest in monitoring networks.

### Conclusion
This research is promising because it suggests a measurable way to predict earthquakes weeks in advance. However, more research is needed to confirm this method’s reliability and applicability across different earthquake scenarios. If successful, it could revolutionize earthquake prediction and save countless lives.

---

### **Key Takeaways**
- **Scientific basis:** Changes in subsoil water chemistry might predict earthquakes.
- **Mechanism:** Tectonic stress causes microfractures, releasing metals into groundwater.
- **Challenges:** Variability, monitoring logistics, and false positives.
- **Implications:** Early warning could save lives and reduce damage.
- **Next steps:** More research, technology development, and public policy considerations.

--------------------------

Inference ID: summary-2.

TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan

RESULT: ### The Deep Ocean Exploration and Its Implications for Marine Conservation

The deep waters of the Atlantic Ocean have long been a source of mystery and intrigue, hiding secrets that only advanced technology can uncover. Recently, a team of researchers announced their findings from an expedition to these depths, revealing what appears to be new species of fish and squid. These discoveries not only expand our understanding of marine biodiversity but also provide a foundation for protecting marine ecosystems worldwide.

#### The Expedition and Its Discoveries

The expedition, conducted by a team of scientists, utilized cutting-edge technology to explore the deep waters of the Atlantic Ocean. The team deployed deep-sea submersibles and remotely operated vehicles (ROVs) to capture images and samples from the ocean floor. Their findings include what seem to be new species of fish and squid, highlighting the vast biodiversity that remains undiscovered in the deep ocean.

The discovery of new species is particularly significant because it underscores the vastness of marine life that remains unknown to science. The deep ocean, with its extreme pressures and darkness, presents a unique environment that has led to the evolution of creatures with distinct adaptations. These findings provide a glimpse into the complex web of life that exists beneath the ocean's surface.

#### Implications for Marine Conservation

The discoveries made by the expedition have far-reaching implications for marine conservation. By identifying new species, the researchers can better understand the biodiversity of the Atlantic Ocean and its ecological dynamics. This knowledge is essential for developing effective conservation strategies that protect marine ecosystems from threats such as overfishing, climate change, and deep-sea mining.

One of the key takeaways from the expedition is the need for international cooperation in marine conservation. The deep waters of the Atlantic Ocean are a shared resource, and their protection requires a global effort. The findings can be used to advocate for the establishment of marine protected areas (MPAs) that safeguard the habitats of these newly discovered species.

#### The Role of Technology in Ocean Exploration

The success of the expedition highlights the importance of technology in ocean exploration. The use of submersibles and ROVs allowed the researchers to reach depths that were previously inaccessible. These technologies provide scientists with the tools to study the deep ocean in unprecedented detail, leading to new discoveries and a better understanding of marine ecosystems.

Moreover, the data collected during the expedition can be used to develop models that predict the impacts of human activities on the deep ocean. This is crucial for making informed decisions about resource extraction and other activities that could harm marine ecosystems.

#### Conclusion

The expedition to the deep waters of the Atlantic Ocean has yielded remarkable findings, including what appear to be new species of fish and squid. These discoveries not only expand our knowledge of marine biodiversity but also provide a foundation for protecting marine ecosystems worldwide. By leveraging these findings, scientists can advocate for policies that safeguard the deep ocean and its inhabitants, ensuring a sustainable future for marine life.

The exploration of the deep ocean is a testament to the power of scientific inquiry and the potential for new discoveries to drive conservation efforts. As we continue to uncover the mysteries of the Atlantic Ocean, we must also commit to protecting its fragile ecosystems for future generations.

--------------------------

Inference ID: topic_classification-0.

TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan

RESULT: The Sierra Nevada is a mountain range in the western United States, stretching across California and Nevada. The area is known for its stunning scenery, diverse wildlife, and rich natural resources. The Sierra Nevada is also a popular destination for outdoor activities such as hiking, camping, and skiing.

The Sierra Nevada is home to a variety of wildlife, including black bears, mountain lions, deer, and elk. The area is also home to a variety of birds, such as eagles, hawks, and owls. The Sierra Nevada is also home to a variety of plant life, including pine trees, oak trees, and wildflowers.

The Sierra Nevada is a popular destination for outdoor activities such as hiking, camping, and skiing. The area is also home to a variety of recreational activities, such as fishing, hunting, and horseback riding. The Sierra Nevada is also a popular destination for rock climbing, mountain biking, and whitewater rafting.

The Sierra Nevada is a beautiful and diverse area that is home to a variety of wildlife and recreational activities. The area is a popular destination for outdoor activities and is a great place to visit for anyone who loves the outdoors.

--------------------------

Inference ID: topic_classification-1.

TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan

RESULT: ```json
{
"primary_category": "science",
"confidence": 0.95
}
```

--------------------------

Inference ID: topic_classification-2.

TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan

RESULT: ```json
{
"primary_category": "science",
"confidence": 0.9
}
```

--------------------------

Inference ID: keyword_extraction-0.

TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan

RESULT: ```json
{
"keywords": ["Forest Service", "wildfires", "California spotted owls", "logging", "Sierra Nevada"],
"context": "The keywords highlight the main elements of the controversy. 'Forest Service' refers to the agency accused of exaggerating wildfire impacts. 'Wildfires' are central to the debate over their effect on 'California spotted owls', a protected species. 'Logging' is the planned activity justified by the disputed claims, and 'Sierra Nevada' is the region where this environmental conflict is taking place."
}
```

--------------------------

Inference ID: keyword_extraction-1.

TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan

RESULT: ```json
{
"keywords": ["earthquakes", "prediction", "geologists", "metals", "subsoil water"],
"context": "The text discusses a new method for predicting earthquakes weeks in advance by Swedish geologists. 'Earthquakes' is the central topic, while 'prediction' highlights the innovative aspect of forecasting them earlier. 'Geologists' refers to the scientists involved in the research. 'Metals' like zinc and copper are key indicators being monitored, and 'subsoil water' is the medium where these metals are detected to predict seismic activity."
}
```

--------------------------

Inference ID: keyword_extraction-2.

TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan

RESULT: ```json
{
"keywords": ["Marine expedition", "New species", "Atlantic Ocean", "Norwegian scientists", "Marine ecosystems"],
"context": "The keywords highlight the main aspects of the text. 'Marine expedition' refers to the scientific exploration mentioned. 'New species' underscores the discovery of unidentified fish and squid. 'Atlantic Ocean' specifies the location of the research. 'Norwegian scientists' identifies the group conducting the study. 'Marine ecosystems' relates to the potential global impact of the findings for conservation efforts."
}
```

Summary¶

This tutorial used the chat completion endpoint to perform many tasks via kluster.ai batch API. This particular example performed five different tasks for each element of the dataset: sentiment analysis, translation (to Spanish), summarization, topic classification and keyword extraction.

To submit a batch job we've:

Created the JSONL file, where each line of the file represented a separate request (for each task and element of dataset)
Submitted the file to the platform
Started the batch job, and monitored its progress
Once completed, we fetched the results

All of this using the OpenAI Python library and API, no changes needed!

Kluster.ai's batch API empowers you to scale your workflows seamlessly, making it an invaluable tool for processing extensive datasets. As next steps, feel free to create your own dataset, or expand on top of this existing example. Good luck!