Multiple inference requests with kluster.ai¶
In other notebooks, we used AI models to perform simple tasks like text classification, sentiment analysis and keyword extraction.
This tutorial runs through a notebook where you'll learn how to use the kluster.ai batch API to combine different tasks into a single batch file. Note that each task in the JSONL file can have its own model, system prompt, and particular request.
You can adapt this example by using your data and categories relevant to your use case. With this approach, you can effortlessly process datasets of any scale, big or small, and obtain categorized results powered by a state-of-the-art language model.
Prerequisites¶
Before getting started, ensure you have the following:
- A kluster.ai account - sign up on the kluster.ai platform if you don't have one
- A kluster.ai API key - after signing in, go to the API Keys section and create a new key. For detailed instructions, check out the Get an API key guide
Setup¶
In this notebook, we'll use Python's getpass
module to safely input the key. After execution, please provide your unique kluster.ai API key (ensure no spaces).
from getpass import getpass
api_key = getpass("Enter your kluster.ai API key: ")
Enter your kluster.ai API key: ········
Next, ensure you've installed OpenAI Python library:
pip install -q openai
Note: you may need to restart the kernel to use updated packages.
With the OpenAI Python library installed, we import the necessary dependencies for the tutorial:
from openai import OpenAI
import pandas as pd
import time
import json
import os
import urllib.request
import requests
from IPython.display import clear_output, display
pd.set_option('display.max_columns', 1000, 'display.width', 1000, 'display.max_rows',1000, 'display.max_colwidth', 500)
And then, initialize the client
by pointing it to the kluster.ai endpoint, and passing your API key.
# Set up the client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key=api_key,
)
Get the data¶
Now that you've initialized an OpenAI-compatible client pointing to kluster.ai, we can discuss the data.
This notebook includes three sample datasets: Amazon musical instruments reviews, Top 1000 IMDb Movies, and AG News sample.
The following code fetches the data and the last 5 data points of a single data sample. Feel free to change this or bring your own dataset.
# Datasets
#1. Amazon musical instruments reviews sample dataset
#url = "https://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Musical_Instruments_5.json.gz"
#2. IMDB top 1000 sample dataset
#url = "https://raw.githubusercontent.com/kluster-ai/klusterai-cookbook/refs/heads/main/data/imdb_top_1000.csv"
#3. AG News sample dataset
url = "https://raw.githubusercontent.com/kluster-ai/klusterai-cookbook/refs/heads/main/data/ag_news.csv"
def fetch_dataset(url, file_path=None):
# Set the default file path based on the URL if none is provided
if not file_path:
file_path = os.path.join("data", os.path.basename(url))
# Create the directory if it does not exist
os.makedirs(os.path.dirname(file_path), exist_ok=True)
# Download the file if it doesn't already exist
if not os.path.exists(file_path):
urllib.request.urlretrieve(url, file_path)
print(f"Dataset downloaded and saved as {file_path}")
else:
print(f"Using cached file at {file_path}")
# Load and process the dataset based on URL content
if "imdb_top_1000.csv" in url:
df = pd.read_csv(file_path)
df['text'] = df['Series_Title'].astype(str) + ": " + df['Overview'].astype(str)
df = df[['text']]
elif "ag_news" in url:
df = pd.read_csv(file_path, header=None, names=["label", "title", "description"])
df['text'] = df['title'].astype(str) + ": " + df['description'].astype(str)
df = df[['text']]
elif "reviews_Musical_Instruments_5.json.gz" in url:
df = pd.read_json(file_path, compression='gzip', lines=True)
df.rename(columns={'reviewText': 'text'}, inplace=True)
df = df[['text']]
else:
raise ValueError("URL does not match any known dataset format.")
return df[['text']].tail(3).reset_index(drop=True) # Return last 3 entries resetting the index
# Fetch dataset
df = fetch_dataset(url=url, file_path=None)
df.head()
Dataset downloaded and saved as data/ag_news.csv
text | |
---|---|
0 | Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan |
1 | New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan |
2 | Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan |
Now that we've fetched and saved the dataset let's move to the batch inference flow.
Define the requests¶
For this particular tutorial, we predefined five requests for the model to execute based on common customer use cases:
- Sentiment analysis - reviewing text to determine whether there is positive, neutral, or negative notation to the statement
- Translation - translate the text to any other language, in this example, Spanish
- Summarization - express the text in a concise form
- Topic classification - classify the text between a given set of categories
- Keyword extraction - provide a number of keywords
Requests are defined as a system prompt. This example runs through different types of requests, so they are defined as JSON objects. For each use case, we also defined the structure of the response we expect from the model.
If you’re happy with these requests and structure, you can simply run the code as-is. However, if you’d like to customize them, please modify the prompts (or add new ones) to make personal requests.
SYSTEM_PROMPTS = {
'sentiment': '''
Analyze the sentiment of the given text. Provide only a JSON object with the following structure:
{
"sentiment": string, // "positive", "negative", or "neutral"
"confidence": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis
}
''',
'translation': '''
Translate the given text from English to Spanish, paraphrase, rewrite or perform cultural adaptations for the text to make sense in Spanish. Provide only a JSON object with the following structure:
{
"translation": string, // The Spanish translation
"notes": string // Any notes about the translation, such as cultural adaptations or challenging phrases (max 500 words). Write this mainly in English.
}
''',
'summary': '''
Summarize the main points of the given text. Provide only a JSON object with the following structure:
{
"summary": string, // A concise summary of the text (max 100 words)
}
''',
'topic_classification': '''
Classify the main topic of the given text based on the following categories: "politics", "sports", "technology", "science", "business", "entertainment", "health", "other". Provide only a JSON object with the following structure:
{
"category": string, // The primary category of the provided text
"confidence": float, // A value between 0 and 1 indicating confidence in the classification
}
''',
'keyword_extraction': '''
Extract relevant keywords from the given text. Provide only a JSON object with the following structure:
{
"keywords": string[], // An array of up to 5 keywords that best represent the text content
"context": string // Briefly explain how each keyword is relevant to the text (max 200 words)
}
'''
}
Create the batch job file¶
This example selects the deepseek-ai/DeepSeek-V3
model. If you'd like to use a different model, feel free to change it by modifying the model
field. In this notebook, you can also comment DeepSeek V3, and uncomment whatever model you want to try out.
Please refer to the Supported models section for a list of the models we support.
The following snippets prepare the JSONL file, where each line represents a different request. Note that each separate batch request can have its own model. Also, we are using a temperature of 0.5
but feel free to change it and play around with the different outcomes (but we are only asking to respond with a single word, the genre).
# Models
#model="deepseek-ai/DeepSeek-R1"
model="deepseek-ai/DeepSeek-V3"
#model="klusterai/Meta-Llama-3.1-8B-Instruct-Turbo"
#model="klusterai/Meta-Llama-3.1-405B-Instruct-Turbo"
#model="klusterai/Meta-Llama-3.3-70B-Instruct-Turbo"
#model="Qwen/Qwen2.5-VL-7B-Instruct"
def create_batch_file(df, inference_type, system_prompt):
batch_list = []
for index, row in df.iterrows():
content = row["text"]
request = {
"custom_id": f"{inference_type}-{index}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": model,
"temperature": 0.5,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": content},
],
},
}
batch_list.append(request)
return batch_list
def save_batch_file(batch_list, inference_type):
filename = f"data/batch_request_{inference_type}.jsonl"
with open(filename, "w") as file:
for request in batch_list:
file.write(json.dumps(request) + "\n")
return filename
batch_requests = []
# Loop through all the different prompts
for inference_type, system_prompt in SYSTEM_PROMPTS.items():
batch_list = create_batch_file(df, inference_type, system_prompt)
filename = save_batch_file(batch_list, inference_type)
batch_requests.append((inference_type, filename))
print(f"File {filename} saved")
File data/batch_request_sentiment.jsonl saved File data/batch_request_translation.jsonl saved File data/batch_request_summary.jsonl saved File data/batch_request_topic_classification.jsonl saved File data/batch_request_keyword_extraction.jsonl saved
Next, we can preview what a single batch job looks like:
!head -n 5 data/batch_request_sentiment.jsonl
{"custom_id": "sentiment-0", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3", "temperature": 0.5, "messages": [{"role": "system", "content": "\n Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n {\n \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n }\n "}, {"role": "user", "content": "Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan"}]}} {"custom_id": "sentiment-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3", "temperature": 0.5, "messages": [{"role": "system", "content": "\n Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n {\n \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n }\n "}, {"role": "user", "content": "New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan"}]}} {"custom_id": "sentiment-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "deepseek-ai/DeepSeek-V3", "temperature": 0.5, "messages": [{"role": "system", "content": "\n Analyze the sentiment of the given text. Provide only a JSON object with the following structure:\n {\n \"sentiment\": string, // \"positive\", \"negative\", or \"neutral\"\n \"confidence\": float, // A value between 0 and 1 indicating your confidence in the sentiment analysis\n }\n "}, {"role": "user", "content": "Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan"}]}}
Upload batch job files to kluster.ai¶
Now that we've prepared our input files, it's time to upload it to the kluster.ai platform. To do so, you can use the files.create
endpoint of the client, where the purpose is set to batch
. This will return the file ID, which we need to log for the next steps. We will repeat the process for each batch file created.
def upload_batch_file(data_dir):
print(f"Creating request for {data_dir}")
with open(data_dir, 'rb') as file:
upload_response = client.files.create(
file=file,
purpose="batch"
)
# Print job ID
file_id = upload_response.id
print(f"File uploaded successfully. File ID: {file_id}")
return upload_response
batch_files = []
DATA_FOLDER = "data"
# Loop through all .jsonl files in the data folder
for file in os.listdir(DATA_FOLDER):
if file.endswith(".jsonl"):
data_dir = os.path.join(DATA_FOLDER, file)
print(f"Uploading file {data_dir}")
job = upload_batch_file(data_dir)
batch_files.append((data_dir, job))
Uploading file data/batch_request_topic_classification.jsonl Creating request for data/batch_request_topic_classification.jsonl File uploaded successfully. File ID: 67e40792a135957969fda091 Uploading file data/batch_request_keyword_extraction.jsonl Creating request for data/batch_request_keyword_extraction.jsonl File uploaded successfully. File ID: 67e40792c72788860894306f Uploading file data/batch_request_translation.jsonl Creating request for data/batch_request_translation.jsonl File uploaded successfully. File ID: 67e40793a135957969fda097 Uploading file data/batch_request_sentiment.jsonl Creating request for data/batch_request_sentiment.jsonl File uploaded successfully. File ID: 67e407930fdb60564c2bb237 Uploading file data/batch_request_summary.jsonl Creating request for data/batch_request_summary.jsonl File uploaded successfully. File ID: 67e40794ce6b9bab9cadaf34
All files are now uploaded, and we can proceed with creating the batch jobs.
Start the batch job¶
Once all the files have been successfully uploaded, we're ready to start (create) the batch jobs by providing the file ID of each file, which we got in the previous step. To start each job, we use the batches.create
method, for which we need to set the endpoint to /v1/chat/completions
. This will return each batch job details, with each ID.
# Create batch job with completions endpoint
def create_batch_job(file_id):
batch_job = client.batches.create(
input_file_id=file_id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch job created with ID {batch_job.id}")
return batch_job
batch_jobs = []
for dir_path, batch_file in batch_files:
print(f"Creating batch job for file ID {batch_file.id}")
batch_job = create_batch_job(batch_file.id)
batch_jobs.append(batch_job)
Creating batch job for file ID 67e40792a135957969fda091 Batch job created with ID 67e40798ce6b9bab9cadaf56 Creating batch job for file ID 67e40792c72788860894306f Batch job created with ID 67e407a2a135957969fda1b4 Creating batch job for file ID 67e40793a135957969fda097 Batch job created with ID 67e407adc7278886089431ea Creating batch job for file ID 67e407930fdb60564c2bb237 Batch job created with ID 67e407b7c727888608943275 Creating batch job for file ID 67e40794ce6b9bab9cadaf34 Batch job created with ID 67e407c2a135957969fda38e
Check job progress¶
Now that your batch jobs have been created, you can track their progress.
To monitor the job's progress, we can use the batches.retrieve
method and pass the batch job ID. The response contains a status
field that tells us if it is completed or not and the subsequent status of each job separately. We can repeat this process for every batch job ID we got in the previous step.
The following snippet checks the status of all batch jobs every 10 seconds until the entire batch is completed.
def monitor_batch_jobs(batch_jobs):
all_completed = False
while not all_completed:
all_completed = True
output_lines = []
for job in batch_jobs:
updated_job = client.batches.retrieve(job.id)
if updated_job.status != "completed":
all_completed = False
completed = updated_job.request_counts.completed
total = updated_job.request_counts.total
output_lines.append(f"Job {job.id} status: {updated_job.status} - Progress: {completed}/{total}")
else:
output_lines.append(f"Job {job.id} completed!")
clear_output(wait=True)
for line in output_lines:
display(line)
if not all_completed:
time.sleep(refresh_interval)
monitor_batch_jobs(batch_jobs)
'Job 67e40798ce6b9bab9cadaf56 completed!'
'Job 67e407a2a135957969fda1b4 completed!'
'Job 67e407adc7278886089431ea completed!'
'Job 67e407b7c727888608943275 completed!'
'Job 67e407c2a135957969fda38e completed!'
Get the results¶
With all jobs completed, we'll retrieve the results and review the responses generated for each request. The results are parsed. To fetch the results from the platform, you need to retrieve the output_file_id
from the batch job, and then use the files.content
endpoint, providing that specific file ID. We will repeat this for every single batch job id. Note that the job status must be completed
for you to retrieve the results!
#Parse results as a JSON object
def parse_json_objects(data_string):
if isinstance(data_string, bytes):
data_string = data_string.decode('utf-8')
json_strings = data_string.strip().split('\n')
json_objects = []
for json_str in json_strings:
try:
json_obj = json.loads(json_str)
json_objects.append(json_obj)
except json.JSONDecodeError as e:
print(f"Error parsing JSON: {e}")
return json_objects
for batch_job in batch_jobs:
job_status = client.batches.retrieve(batch_job.id)
result_file_id = job_status.output_file_id
result = client.files.content(result_file_id).content
results = parse_json_objects(result)
for res in results:
inference_id = res['custom_id']
index = inference_id.split('-')[-1]
result = res['response']['body']['choices'][0]['message']['content']
text = df.iloc[int(index)]['text']
print(f'\n -------------------------- \n')
print(f"Inference ID: {inference_id}. \n\nTEXT: {text}\n\nRESULT: {result}")
-------------------------- Inference ID: topic_classification-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: ```json { "category": "politics", "confidence": 0.7 } ``` -------------------------- Inference ID: topic_classification-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: { "category": "science", "confidence": 0.95 } -------------------------- Inference ID: topic_classification-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: ```json { "category": "science", "confidence": 0.95 } ``` -------------------------- Inference ID: keyword_extraction-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: ```json { "keywords": ["Forest Service", "wildfires", "California spotted owls", "logging", "Sierra Nevada"], "context": "The Forest Service is accused of overstating the impact of wildfires on California spotted owls to justify increased logging in the Sierra Nevada. 'Forest Service' is central as it is the agency involved. 'Wildfires' are the environmental event being discussed. 'California spotted owls' are the species allegedly affected. 'Logging' is the planned activity being justified. 'Sierra Nevada' is the geographic location where these events are taking place." } ``` -------------------------- Inference ID: keyword_extraction-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: { "keywords": ["earthquake prediction", "geologists", "zinc", "copper", "subsoil water"], "context": "The text discusses a new method for earthquake prediction developed by Swedish geologists. 'Earthquake prediction' is the central focus, as the method aims to forecast quakes weeks in advance. 'Geologists' are the scientists involved in this research. 'Zinc' and 'copper' are metals monitored in subsoil water near earthquake sites, which are key indicators in this predictive method. 'Subsoil water' is the medium where these metals are measured, making it a critical component of the study." } -------------------------- Inference ID: keyword_extraction-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: ```json { "keywords": ["Marine Expedition", "New Species", "Atlantic Ocean", "Norwegian Scientists", "Marine Ecosystems"], "context": "The keywords highlight the core aspects of the text. 'Marine Expedition' refers to the scientific exploration conducted in the deep waters. 'New Species' emphasizes the discovery of previously unknown fish and squid. 'Atlantic Ocean' specifies the location of the expedition. 'Norwegian Scientists' identifies the researchers involved in the study. 'Marine Ecosystems' underscores the broader goal of using these findings to protect oceanic environments globally." } ``` -------------------------- Inference ID: translation-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: { "translation": "Acusan a las autoridades federales de exagerar el impacto de los incendios (AP): AP - El Servicio Forestal exageró el efecto de los incendios forestales en los búhos manchados de California para justificar un aumento planificado en la tala de árboles en Sierra Nevada, según un experto de larga trayectoria en la agencia que trabajó en el plan.", "notes": "The translation maintains the original structure and meaning of the text. The term 'Feds' was translated as 'autoridades federales' to convey the informal tone in a way that is natural in Spanish. 'California spotted owls' was translated directly as 'búhos manchados de California' since it is a specific species name. 'Logging' was translated as 'tala de árboles' to clearly convey the activity. No significant cultural adaptations were needed, as the topic is universally understood in Spanish-speaking contexts." } -------------------------- Inference ID: translation-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: { "translation": "Nuevo método podría predecir terremotos con semanas de anticipación (AP): AP - Geólogos suecos podrían haber encontrado una manera de predecir terremotos semanas antes de que ocurran, mediante el monitoreo de la cantidad de metales como zinc y cobre en el agua subterránea cerca de zonas sísmicas, dijeron científicos el miércoles.", "notes": "The translation maintains the original meaning and structure of the text. The phrase 'New Method May Predict Quakes Weeks Ahead' was translated as 'Nuevo método podría predecir terremotos con semanas de anticipación' to ensure clarity and accuracy. The term 'subsoil water' was translated as 'agua subterránea,' which is the common term used in Spanish for water found beneath the ground. The mention of 'Swedish geologists' and 'scientists' was kept as 'geólogos suecos' and 'científicos' respectively, as these terms are directly translatable and widely understood in Spanish. The date 'Wednesday' was translated as 'miércoles,' which is the standard translation for the day of the week in Spanish." } -------------------------- Inference ID: translation-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: { "translation": "Expedición Marina Descubre Nuevas Especies (AP): AP - Científicos noruegos que exploraron las aguas profundas del Océano Atlántico anunciaron el jueves que sus hallazgos —incluyendo lo que parecen ser nuevas especies de peces y calamares— podrían ser utilizados para proteger los ecosistemas marinos a nivel mundial.", "notes": "The translation maintains the original structure and intent of the text. The phrase 'could be used to protect marine ecosystems worldwide' was translated directly as 'podrían ser utilizados para proteger los ecosistemas marinos a nivel mundial,' which is both accurate and culturally appropriate. No significant cultural adaptations were necessary. The use of 'nivel mundial' (worldwide) is common in Spanish to convey global scope." } -------------------------- Inference ID: sentiment-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: ```json { "sentiment": "negative", "confidence": 0.75 } ``` -------------------------- Inference ID: sentiment-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: ```json { "sentiment": "neutral", "confidence": 0.95 } ``` -------------------------- Inference ID: sentiment-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: ```json { "sentiment": "positive", "confidence": 0.9 } ``` -------------------------- Inference ID: summary-0. TEXT: Feds Accused of Exaggerating Fire Impact (AP): AP - The Forest Service exaggerated the effect of wildfires on California spotted owls in justifying a planned increase in logging in the Sierra Nevada, according to a longtime agency expert who worked on the plan.: nan RESULT: { "summary": "A Forest Service expert claims the agency overstated the impact of wildfires on California spotted owls to justify increased logging in the Sierra Nevada." } -------------------------- Inference ID: summary-1. TEXT: New Method May Predict Quakes Weeks Ahead (AP): AP - Swedish geologists may have found a way to predict earthquakes weeks before they happen by monitoring the amount of metals like zinc and copper in subsoil water near earthquake sites, scientists said Wednesday.: nan RESULT: { "summary": "Swedish geologists have developed a potential method to predict earthquakes weeks in advance by monitoring levels of metals like zinc and copper in subsoil water near earthquake sites." } -------------------------- Inference ID: summary-2. TEXT: Marine Expedition Finds New Species (AP): AP - Norwegian scientists who explored the deep waters of the Atlantic Ocean said Thursday their findings #151; including what appear to be new species of fish and squid #151; could be used to protect marine ecosystems worldwide.: nan RESULT: { "summary": "Norwegian scientists exploring the Atlantic Ocean's deep waters discovered potential new species of fish and squid, which could aid in global marine ecosystem protection." }
Summary¶
This tutorial used the chat completion endpoint to perform many tasks via kluster.ai batch API. This particular example performed five different tasks for each element of the dataset: sentiment analysis, translation (to Spanish), summarization, topic classification and keyword extraction.
To submit a batch job we've:
- Created the JSONL file, where each line of the file represented a separate request (for each task and element of dataset)
- Submitted the file to the platform
- Started the batch job, and monitored its progress
- Once completed, we fetched the results
All of this using the OpenAI Python library and API, no changes needed!
Kluster.ai's batch API empowers you to scale your workflows seamlessly, making it an invaluable tool for processing extensive datasets. As next steps, feel free to create your own dataset, or expand on top of this existing example. Good luck!