Multiple inference requests with kluster.ai¶
In other notebooks, we used AI models to perform simple tasks like text classification, sentiment analysis and keyword extraction.
This tutorial runs through a notebook where you'll learn how to use the kluster.ai batch API to combine different tasks into a single batch file. Note that each task in the JSONL file can have its own model, system prompt, and particular request.
You can adapt this example by using your data and categories relevant to your use case. With this approach, you can effortlessly process datasets of any scale, big or small, and obtain categorized results powered by a state-of-the-art language model.
Prerequisites¶
Before getting started, ensure you have the following:
- A kluster.ai account - sign up on the kluster.ai platform if you don't have one
- A kluster.ai API key - after signing in, go to the API Keys section and create a new key. For detailed instructions, check out the Get an API key guide
Setup¶
In this notebook, we'll use Python's getpass
module to safely input the key. After execution, please provide your unique kluster.ai API key (ensure no spaces).
from getpass import getpass
api_key = getpass("Enter your kluster.ai API key: ")
Enter your kluster.ai API key: ········
Next, ensure you've installed OpenAI Python library:
%pip install -q openai
Note: you may need to restart the kernel to use updated packages.
With the OpenAI Python library installed, we import the necessary dependencies for the tutorial:
from openai import OpenAI
import pandas as pd
import time
import json
import os
import urllib.request
import requests
from IPython.display import clear_output, display
pd.set_option('display.max_columns', 1000, 'display.width', 1000, 'display.max_rows',1000, 'display.max_colwidth', 500)
And then, initialize the client
by pointing it to the kluster.ai endpoint, and passing your API key.
# Set up the client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key=api_key,
)
Get the data¶
Now that you've initialized an OpenAI-compatible client pointing to kluster.ai, we can discuss the data.
This notebook includes three sample datasets: Amazon musical instruments reviews, Top 1000 IMDb Movies, and AG News sample.
The following code fetches the data and the last 5 data points of a single data sample. Feel free to change this or bring your own dataset.
# Datasets
#1. Amazon musical instruments reviews sample dataset
#url = "https://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Musical_Instruments_5.json.gz"
#2. IMDB top 1000 sample dataset
#url = "https://raw.githubusercontent.com/kluster-ai/klusterai-cookbook/refs/heads/main/data/imdb_top_1000.csv"
#3. AG News sample dataset
url = "https://raw.githubusercontent.com/kluster-ai/klusterai-cookbook/refs/heads/main/data/ag_news.csv"
def fetch_dataset(url, file_path=None):
# Set the default file path based on the URL if none is provided
if not file_path:
file_path = os.path.join("data", os.path.basename(url))
# Create the directory if it does not exist
os.makedirs(os.path.dirname(file_path), exist_ok=True)
# Download the file if it doesn't already exist
if not os.path.exists(file_path):
urllib.request.urlretrieve(url, file_path)
print(f"Dataset downloaded and saved as {file_path}")
else:
print(f"Using cached file at {file_path}")
# Load and process the dataset based on URL content
if "imdb_top_1000.csv" in url:
df = pd.read_csv(file_path)
df['text'] = df['Series_Title'].astype(str) + ": " + df['Overview'].astype(str)
df = df[['text']]
elif "ag_news" in url:
df = pd.read_csv(file_path, header=None, names=["label", "title", "description"])
df['text'] = df['title'].astype(str) + ": " + df['description'].astype(str)
df = df[['text']]
elif "reviews_Musical_Instruments_5.json.gz" in url:
df = pd.read_json(file_path, compression='gzip', lines=True)
df.rename(columns={'reviewText': 'text'}, inplace=True)
df = df[['text']]
else:
raise ValueError("URL does not match any known dataset format.")
return df[['text']].tail(3).reset_index(drop=True) # Return last 3 entries resetting the index
# Fetch dataset
df = fetch_dataset(url=url, file_path=None)
df.head()
Dataset downloaded and saved as data/ag_news.csv
text | |
---|---|
0 | Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan |
1 | New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan |
2 | Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan |
Now that we've fetched and saved the dataset let's move to the batch inference flow.
Define the requests¶
For this particular tutorial, we predefined five requests for the model to execute based on common customer use cases:
- Sentiment analysis - reviewing text to determine whether there is positive, neutral, or negative notation to the statement
- Translation - translate the text to any other language, in this example, Spanish
- Summarization - express the text in a concise form
- Topic classification - classify the text between a given set of categories
- Keyword extraction - provide a number of keywords
Requests are defined as a system prompt. This example runs through different types of requests, so they are defined as JSON objects. For each use case, we also defined the structure of the response we expect from the model.
If you’re happy with these requests and structure, you can simply run the code as-is. However, if you’d like to customize them, please modify the prompts (or add new ones) to make personal requests.
SYSTEM_PROMPTS = {
'sentiment': '''
Analyze the sentiment of the given text. Provide only a JSON object with the following structure:
{
"sentiment": string, // "positive", "negative", or "neutral"
"confidence": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis
}
''',
'translation': '''
Translate the given text from English to Spanish, paraphrase, rewrite or perform cultural adaptations for the text to make sense in Spanish. Provide only a JSON object with the following structure:
{
"translation": string, // The Spanish translation
"notes": string // Any notes about the translation, such as cultural adaptations or challenging phrases (max 500 words). Write this mainly in English.
}
''',
'summary': '''
Summarize the main points of the given text. Provide only a JSON object with the following structure:
{
"summary": string, // A concise summary of the text (max 100 words)
}
''',
'topic_classification': '''
Classify the main topic of the given text based on the following categories: "politics", "sports", "technology", "science", "business", "entertainment", "health", "other". Provide only a JSON object with the following structure:
{
"category": string, // The primary category of the provided text
"confidence": float, // A value between 0 and 1 indicating confidence in the classification
}
''',
'keyword_extraction': '''
Extract relevant keywords from the given text. Provide only a JSON object with the following structure:
{
"keywords": string[], // An array of up to 5 keywords that best represent the text content
"context": string // Briefly explain how each keyword is relevant to the text (max 200 words)
}
'''
}
Create the batch job file¶
This example uses the deepseek-ai/DeepSeek-V3
model. If you'd like to use a different model, feel free to change it by modifying the model
field. In this notebook, you can also comment DeepSeek V3, and uncomment whatever model you want to try out.
Please refer to the Supported models section for a list of the models we support.
The following snippets prepare the JSONL file, where each line represents a different request. Note that each separate batch request can have its own model. Also, we are using a temperature of 0.5
but feel free to change it and play around with the different outcomes.
# Models
# model="deepseek-ai/DeepSeek-R1"
model = "deepseek-ai/DeepSeek-V3"
# model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo"
# model="klusterai/Meta-Llama-3.1-405B-Instruct-Turbo"
# model="klusterai/Meta-Llama-3.3-70B-Instruct-Turbo"
# model="Qwen/Qwen2.5-VL-7B-Instruct"
def create_batch_file(df, inference_type, system_prompt):
batch_list = []
for index, row in df.iterrows():
content = row["text"]
# Build the request for a given model, prompt, and data
request = {
"custom_id": f"{inference_type}-{index}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": model,
"temperature": 0.5,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": content},
],
},
}
batch_list.append(request)
return batch_list
# Save file as JSON lines
def save_batch_file(batch_list, inference_type):
filename = f"data/batch_request_{inference_type}.jsonl"
with open(filename, "w") as file:
for request in batch_list:
file.write(json.dumps(request) + "\n")
return filename
batch_requests = []
filenames = []
# Loop through all the different prompts
for inference_type, system_prompt in SYSTEM_PROMPTS.items():
batch_list = create_batch_file(df, inference_type, system_prompt)
filename = save_batch_file(batch_list, inference_type)
batch_requests.append((inference_type, filename))
filenames.append(filename)
print(filename)
data/batch_request_sentiment.jsonl data/batch_request_translation.jsonl data/batch_request_summary.jsonl data/batch_request_topic_classification.jsonl data/batch_request_keyword_extraction.jsonl
Next, we can preview what a single batch job looks like:
!head -n 5 data/batch_request_sentiment.jsonl
{"custom_id": "sentiment-0", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3", "temperature": 0.5, "messages": [{"role": "system", "content": "\n Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n {\n \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n }\n "}, {"role": "user", "content": "Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan"}]}} {"custom_id": "sentiment-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3", "temperature": 0.5, "messages": [{"role": "system", "content": "\n Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n {\n \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n }\n "}, {"role": "user", "content": "New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan"}]}} {"custom_id": "sentiment-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3", "temperature": 0.5, "messages": [{"role": "system", "content": "\n Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n {\n \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n }\n "}, {"role": "user", "content": "Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan"}]}}
Upload batch job files to kluster.ai¶
Now that we've prepared our input files, it's time to upload it to the kluster.ai platform. To do so, you can use the files.create
endpoint of the client, where the purpose is set to batch
. This will return the file ID, which we need to log for the next steps. We will repeat the process for each batch file created.
def upload_batch_file(data_dir):
print(f"Creating request for {data_dir}")
with open(data_dir, 'rb') as file:
upload_response = client.files.create(
file=file,
purpose="batch"
)
# Print job ID
file_id = upload_response.id
print(f"File uploaded successfully. File ID: {file_id}")
return upload_response
batch_files = []
# Loop through all .jsonl files in the data folder
for data_dir in filenames:
print(f"Uploading file {data_dir}")
job = upload_batch_file(data_dir)
batch_files.append(job)
Uploading file data/batch_request_sentiment.jsonl Creating request for data/batch_request_sentiment.jsonl File uploaded successfully. File ID: 67e677a2c04383db4bd5141d Uploading file data/batch_request_translation.jsonl Creating request for data/batch_request_translation.jsonl File uploaded successfully. File ID: 67e677a3711c9502a75a01ad Uploading file data/batch_request_summary.jsonl Creating request for data/batch_request_summary.jsonl File uploaded successfully. File ID: 67e677a330398a707d4a884d Uploading file data/batch_request_topic_classification.jsonl Creating request for data/batch_request_topic_classification.jsonl File uploaded successfully. File ID: 67e677a3711c9502a75a01b3 Uploading file data/batch_request_keyword_extraction.jsonl Creating request for data/batch_request_keyword_extraction.jsonl File uploaded successfully. File ID: 67e677a41f2fb6ea20485e37
All files are now uploaded, and we can proceed with creating the batch jobs.
Start the batch job¶
Once all the files have been successfully uploaded, we're ready to start (create) the batch jobs by providing the file ID of each file, which we got in the previous step. To start each job, we use the batches.create
method, for which we need to set the endpoint to /v1/chat/completions
. This will return each batch job details, with each ID.
# Create batch job with completions endpoint
def create_batch_job(file_id):
batch_job = client.batches.create(
input_file_id=file_id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch job created with ID {batch_job.id}")
return batch_job
batch_jobs = []
# Loop through all batch files ID and start each job
for batch_file in batch_files:
print(f"Creating batch job for file ID {batch_file.id}")
batch_job = create_batch_job(batch_file.id)
batch_jobs.append(batch_job)
Creating batch job for file ID 67e677a2c04383db4bd5141d Batch job created with ID 67e677a88d3b27ee9af94ce3 Creating batch job for file ID 67e677a3711c9502a75a01ad Batch job created with ID 67e677b230398a707d4a893d Creating batch job for file ID 67e677a330398a707d4a884d Batch job created with ID 67e677bc1f2fb6ea20486001 Creating batch job for file ID 67e677a3711c9502a75a01b3 Batch job created with ID 67e677c730398a707d4a8a77 Creating batch job for file ID 67e677a41f2fb6ea20485e37 Batch job created with ID 67e677d2711c9502a75a042e
All requests are currently being processed.
Check job progress¶
Now that your batch jobs have been created, you can track their progress.
To monitor the job's progress, we can use the batches.retrieve
method and pass the batch job ID. The response contains a status
field that tells us if it is completed or not and the subsequent status of each job separately. We can repeat this process for every batch job ID we got in the previous step.
The following snippet checks the status of all batch jobs every 10 seconds until the entire batch is completed.
def monitor_batch_jobs(batch_jobs):
all_completed = False
# Loop until all jobs are completed
while not all_completed:
all_completed = True
output_lines = []
# Loop through all batch jobs
for job in batch_jobs:
updated_job = client.batches.retrieve(job.id)
status = updated_job.status
# If job is completed
if status == "completed":
output_lines.append("Job completed!")
# If job failed, cancelled or expired
elif status in ["failed", "cancelled", "expired"]:
output_lines.append(f"Job ended with status: {status}")
break
# If job is ongoing
else:
all_completed = False
completed = updated_job.request_counts.completed
total = updated_job.request_counts.total
output_lines.append(
f"Job status: {status} - Progress: {completed}/{total}"
)
# Clear terminal
clear_output(wait=True)
for line in output_lines:
display(line)
# Check every 10 seconds
if not all_completed:
time.sleep(10)
monitor_batch_jobs(batch_jobs)
'Job completed!'
'Job completed!'
'Job completed!'
'Job completed!'
'Job completed!'
Get the results¶
With all jobs completed, we'll retrieve the results and review the responses generated for each request. The results are parsed. To fetch the results from the platform, you need to retrieve the output_file_id
from the batch job, and then use the files.content
endpoint, providing that specific file ID. We will repeat this for every single batch job id. Note that the job status must be completed
for you to retrieve the results!
#Parse results as a JSON object
def parse_json_objects(data_string):
if isinstance(data_string, bytes):
data_string = data_string.decode('utf-8')
json_strings = data_string.strip().split('\n')
json_objects = []
for json_str in json_strings:
try:
json_obj = json.loads(json_str)
json_objects.append(json_obj)
except json.JSONDecodeError as e:
print(f"Error parsing JSON: {e}")
return json_objects
# Go through all batch jobs, providing the output file ID
for batch_job in batch_jobs:
job_status = client.batches.retrieve(batch_job.id)
result_file_id = job_status.output_file_id
result = client.files.content(result_file_id).content
results = parse_json_objects(result)
# For each, print the result
for res in results:
inference_id = res['custom_id']
index = inference_id.split('-')[-1]
result = res['response']['body']['choices'][0]['message']['content']
text = df.iloc[int(index)]['text']
print(f'\n -------------------------- \n')
print(f"Inference ID: {inference_id}. \n\nTEXT: {text}\n\nRESULT: {result}")
-------------------------- Inference ID: sentiment-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: ```json { "sentiment": "negative", "confidence": 0.75 } ``` -------------------------- Inference ID: sentiment-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: ```json { "sentiment": "neutral", "confidence": 0.95 } ``` -------------------------- Inference ID: sentiment-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: ```json { "sentiment": "positive", "confidence": 0.85 } ``` -------------------------- Inference ID: translation-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: { "translation": "Acusan a las autoridades federales de exagerar el impacto de los incendios (AP): AP - El Servicio Forestal exageró el efecto de los incendios forestales en los búhos manchados de California para justificar un aumento planificado de la tala en la Sierra Nevada, según un experto de larga trayectoria en la agencia que trabajó en el plan.", "notes": "The translation maintains the original structure and meaning of the text. The term 'Feds' was translated as 'autoridades federales' to convey the same informal yet official tone. 'Fire impact' was translated as 'impacto de los incendios' to ensure clarity. The phrase 'California spotted owls' was translated as 'búhos manchados de California' to accurately reflect the species name. The term 'logging' was translated as 'tala,' which is the standard term used in Spanish for this context. The phrase 'longtime agency expert' was translated as 'experto de larga trayectoria en la agencia' to emphasize the person's extensive experience within the organization." } -------------------------- Inference ID: translation-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: { "translation": "Nuevo Método Podría Predecir Terremotos con Semanas de Anticipación (AP): AP - Geólogos suecos podrían haber encontrado una manera de predecir terremotos semanas antes de que ocurran, mediante el monitoreo de la cantidad de metales como zinc y cobre en el agua subterránea cerca de los sitios de terremotos, dijeron científicos el miércoles.", "notes": "The translation maintains the original meaning and structure of the text. The phrase 'subsoil water' was translated as 'agua subterránea,' which is the most common term used in Spanish for this concept. The cultural context remains the same as the topic is scientific and universally understood. No significant cultural adaptations were necessary." } -------------------------- Inference ID: translation-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: { "translation": "Expedición marina descubre nuevas especies (AP): AP - Científicos noruegos que exploraron las aguas profundas del Océano Atlántico anunciaron el jueves que sus hallazgos —incluyendo lo que parecen ser nuevas especies de peces y calamares— podrían ser utilizados para proteger los ecosistemas marinos a nivel mundial.", "notes": "The translation maintains the original structure and intent of the text. The use of em dashes (—) is preserved to indicate a break in thought, which is also common in Spanish. The phrase 'could be used to protect marine ecosystems worldwide' is translated directly, as the concept of protecting ecosystems is universally understood and relevant in Spanish-speaking contexts. No significant cultural adaptations were necessary, as the topic of scientific discovery and environmental protection is globally applicable." } -------------------------- Inference ID: summary-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: { "summary": "A Forest Service expert claims the agency overstated wildfires' impact on California spotted owls to justify increased logging in the Sierra Nevada." } -------------------------- Inference ID: summary-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: { "summary": "Swedish geologists have potentially developed a method to predict earthquakes weeks in advance by monitoring zinc and copper levels in subsoil water near earthquake-prone areas." } -------------------------- Inference ID: summary-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: { "summary": "Norwegian scientists discovered potential new species of fish and squid during a deep-sea expedition in the Atlantic Ocean. Their findings aim to aid in the protection of global marine ecosystems." } -------------------------- Inference ID: topic_classification-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: ```json { "category": "politics", "confidence": 0.8 } ``` -------------------------- Inference ID: topic_classification-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: ```json { "category": "science", "confidence": 0.95 } ``` -------------------------- Inference ID: topic_classification-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: ```json { "category": "science", "confidence": 0.95 } ``` -------------------------- Inference ID: keyword_extraction-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: ```json { "keywords": ["Forest Service", "wildfires", "California spotted owls", "logging", "Sierra Nevada"], "context": "The keywords highlight the core elements of the text. 'Forest Service' refers to the agency accused of exaggerating wildfire impacts. 'Wildfires' are the natural disaster in question, impacting 'California spotted owls', a species affected by the fires. 'Logging' is the planned activity justified by the exaggerated claims, and 'Sierra Nevada' is the geographic region where these events are taking place." } ``` -------------------------- Inference ID: keyword_extraction-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: ```json { "keywords": ["earthquake prediction", "geologists", "zinc", "copper", "subsoil water"], "context": "The text discusses a new method developed by Swedish geologists to predict earthquakes weeks in advance. The method involves monitoring the levels of metals such as zinc and copper in subsoil water near earthquake sites. 'Earthquake prediction' is the main focus, while 'geologists' refers to the scientists involved. 'Zinc' and 'copper' are the metals being monitored, and 'subsoil water' is the medium where these metals are measured." } ``` -------------------------- Inference ID: keyword_extraction-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: { "keywords": ["Marine Expedition", "New Species", "Atlantic Ocean", "Norwegian Scientists", "Marine Ecosystems"], "context": "The keywords highlight the core aspects of the text. 'Marine Expedition' refers to the scientific exploration conducted in the deep waters. 'New Species' emphasizes the discovery of previously unknown fish and squid. 'Atlantic Ocean' specifies the location of the expedition. 'Norwegian Scientists' identifies the group responsible for the research. 'Marine Ecosystems' underscores the broader goal of using the findings to protect ocean habitats globally." }
Summary¶
This tutorial used the chat completion endpoint to perform many tasks via kluster.ai batch API. This particular example performed five different tasks for each element of the dataset: sentiment analysis, translation (to Spanish), summarization, topic classification and keyword extraction.
To submit a batch job we've:
- Created the JSONL file, where each line of the file represented a separate request (for each task and element of dataset)
- Submitted the file to the platform
- Started the batch job, and monitored its progress
- Once completed, we fetched the results
All of this using the OpenAI Python library and API, no changes needed!
Kluster.ai's batch API empowers you to scale your workflows seamlessly, making it an invaluable tool for processing extensive datasets. As next steps, feel free to create your own dataset, or expand on top of this existing example. Good luck!