Sentiment analysis with kluster.ai API¶
Welcome to the sentiment analysis notebook using the kluster.ai Batch API!
In this notebook, we’ll guide you through how to apply the kluster.ai Batch API to perform sentiment analysis on text data. For illustration, we’ll use a sample from the Amazon musical instrument reviews dataset to determine the sentiment of each review. You can easily customize this example to work with your own data and specific use case. This technique allows for efficient processing of datasets, whether small or large, with results neatly categorized using a cutting-edge language model.
To begin, just enter your API key and run the preloaded cells to perform the sentiment analysis. If you don’t have an API key, you can sign up for free on our platform.
Setup¶
Provide your unique kluster.ai API key (ensure there are no spaces). If you don’t have one yet, don’t forget to sign up.
from getpass import getpass
api_key = getpass("Enter your kluster.ai API key: ")
Enter your kluster.ai API key: ········
%pip install -q openai
Note: you may need to restart the kernel to use updated packages.
from openai import OpenAI
import pandas as pd
import time
import json
from IPython.display import clear_output, display
# Set up the client
client = OpenAI(
base_url="https://api.kluster.ai/v1",
api_key=api_key,
)
Get the data¶
We’ve preloaded a sample dataset for you, sourced from Amazon’s reviews of musical instruments. This dataset contains customer feedback on various music-related products, ready for you to analyze. No further setup is required—just jump into the next steps to start working with the data.
df = pd.DataFrame({
"text": [
"It hums, crackles, and I think I'm having problems with my equipment. As soon as I use any of my other cords then the problem is gone. Hosa makes some other products that have good value. But based on my experience I don't recommend this one.",
"I bought this to use with my keyboard. I wasn't really aware that there were other options for keyboard pedals. It doesn't work as smoothly as the pedals do on an acoustic piano, which is what I'd always used. Doesn't have the same feel either. Nowhere close.In my opinion, a sustain pedal like the M-Audio SP-2 Sustain Pedal with Piano Style Action or other similar pedal is a much better choice. The price difference is only a few dollars and the feel and action are so much better.",
"This cable disproves the notion that you get what you pay for. It's quality outweighs its price. Let's face it, a cable is a cable is a cable. But the quality of these cables can vary greatly. I replaced a lighter cable with this one and I was surprised at the difference in the quality of the sound from my amp. I have an Ibanez ART series guitar into an Ibanez 15 watt amp set up in my home. With nothing changed but the cable, there was a significant difference in quality and volume. So much so that I checked with my guitar teacher who said he was not surprised. The quality appears good. The ends are heavy duty and the little bit of hum I had due to the proximity of everything was attenuated to the point where it was inconsequential. I've seen more expensive cables and this one is (so far) great.Hosa GTR210 Guitar Cable 10 Ft",
"Bought this to hook up a Beta 58 to a Panasonic G2 DSLR and a Kodak Zi8 for interviews. Works the way it's supposed to. 90 degree TRS is a nice touch. Good price.",
"96 Just received this cord and it seems to work as expected. What can you say about an adapter cord? It is well made, good construction and sound from my DSLR with my mic is superb."
]
})
Batch inference¶
To run the inference job, we’ll follow three simple steps:
- Create the batch input file - we’ll create a file containing the requests to be processed by the model.
- Upload the batch input file to kluster.ai - once the file is ready, we’ll upload it to the kluster.ai platform using the API, where it will be queued for processing.
- Start the job - after the upload, we’ll trigger the job to process the data.
Everything has already been set up for you—simply run the cells below and watch it work!
Create the Batch input file¶
In this example, we are using the klusterai/Meta-Llama-3.3-70B-Instruct-Turbo
model. If you’d like to switch to another model, feel free to change the model name in the next cell. For a complete list of available models, please refer to our documentation.
def create_inference_file(df):
inference_list = []
for index, row in df.iterrows():
content = row['text']
request = {
"custom_id": f"sentiment-analysis-{index}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "klusterai/Meta-Llama-3.3-70B-Instruct-Turbo",
"temperature": 0.5,
"response_format": {"type": "json_object"},
"messages": [
{"role": "system", "content": 'Analyze the sentiment of this text and respond with one word: positive, negative, or neutral.'},
{"role": "user", "content": content}
],
}
}
inference_list.append(request)
return inference_list
def save_inference_file(inference_list):
filename = f"sentiment_analysis_inference_request.jsonl"
with open(filename, 'w') as file:
for request in inference_list:
file.write(json.dumps(request) + '\n')
return filename
inference_list = create_inference_file(df)
filename = save_inference_file(inference_list)
Let’s preview what that request file looks like:
!head -n 1 sentiment_analysis_inference_request.jsonl
{"custom_id": "sentiment-analysis-0", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "klusterai/Meta-Llama-3.3-70B-Instruct-Turbo", "temperature": 0.5, "response_format": {"type": "json_object"}, "messages": [{"role": "system", "content": "Analyze the sentiment of this text and respond with one word: positive, negative, or neutral."}, {"role": "user", "content": "It hums, crackles, and I think I'm having problems with my equipment. As soon as I use any of my other cords then the problem is gone. Hosa makes some other products that have good value. But based on my experience I don't recommend this one."}]}}
Upload inference file to kluster.ai¶
With our input file ready, the next step is to upload it to the kluster.ai platform.
inference_input_file = client.files.create(
file=open(filename, "rb"),
purpose="batch"
)
Start the job¶
Once the file has been successfully uploaded, we’re ready to start the inference job.
inference_job = client.batches.create(
input_file_id=inference_input_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
All requests are currently being processed!
Check job progress¶
In the next section, we’ll track the status of the job to monitor its progress. Let’s check in and follow its completion.
def parse_json_objects(data_string):
if isinstance(data_string, bytes):
data_string = data_string.decode('utf-8')
json_strings = data_string.strip().split('\n')
json_objects = []
for json_str in json_strings:
try:
json_obj = json.loads(json_str)
json_objects.append(json_obj)
except json.JSONDecodeError as e:
print(f"Error parsing JSON: {e}")
return json_objects
all_completed = False
while not all_completed:
all_completed = True
output_lines = []
updated_job = client.batches.retrieve(inference_job.id)
if updated_job.status != "completed":
all_completed = False
completed = updated_job.request_counts.completed
total = updated_job.request_counts.total
output_lines.append(f"Job status: {updated_job.status} - Progress: {completed}/{total}")
else:
output_lines.append(f"Job completed!")
# Clear the output and display updated status
clear_output(wait=True)
for line in output_lines:
display(line)
if not all_completed:
time.sleep(10)
'Job completed!'
Get the results¶
Now that the job is complete, we’ll fetch the results and examine the responses generated for each request.
job = client.batches.retrieve(inference_job.id)
result_file_id = job.output_file_id
result = client.files.content(result_file_id).content
results = parse_json_objects(result)
for res in results:
task_id = res['custom_id']
index = task_id.split('-')[-1]
result = res['response']['body']['choices'][0]['message']['content']
text = df.iloc[int(index)]['text']
print(f'\n -------------------------- \n')
print(f"Task ID: {task_id}. \n\nINPUT TEXT: {text}\n\nLLM OUTPUT: {result}")
-------------------------- Task ID: sentiment-analysis-0. INPUT TEXT: It hums, crackles, and I think I'm having problems with my equipment. As soon as I use any of my other cords then the problem is gone. Hosa makes some other products that have good value. But based on my experience I don't recommend this one. LLM OUTPUT: Negative. -------------------------- Task ID: sentiment-analysis-1. INPUT TEXT: I bought this to use with my keyboard. I wasn't really aware that there were other options for keyboard pedals. It doesn't work as smoothly as the pedals do on an acoustic piano, which is what I'd always used. Doesn't have the same feel either. Nowhere close.In my opinion, a sustain pedal like the M-Audio SP-2 Sustain Pedal with Piano Style Action or other similar pedal is a much better choice. The price difference is only a few dollars and the feel and action are so much better. LLM OUTPUT: Negative. -------------------------- Task ID: sentiment-analysis-2. INPUT TEXT: This cable disproves the notion that you get what you pay for. It's quality outweighs its price. Let's face it, a cable is a cable is a cable. But the quality of these cables can vary greatly. I replaced a lighter cable with this one and I was surprised at the difference in the quality of the sound from my amp. I have an Ibanez ART series guitar into an Ibanez 15 watt amp set up in my home. With nothing changed but the cable, there was a significant difference in quality and volume. So much so that I checked with my guitar teacher who said he was not surprised. The quality appears good. The ends are heavy duty and the little bit of hum I had due to the proximity of everything was attenuated to the point where it was inconsequential. I've seen more expensive cables and this one is (so far) great.Hosa GTR210 Guitar Cable 10 Ft LLM OUTPUT: Positive. -------------------------- Task ID: sentiment-analysis-3. INPUT TEXT: Bought this to hook up a Beta 58 to a Panasonic G2 DSLR and a Kodak Zi8 for interviews. Works the way it's supposed to. 90 degree TRS is a nice touch. Good price. LLM OUTPUT: Positive. -------------------------- Task ID: sentiment-analysis-4. INPUT TEXT: 96 Just received this cord and it seems to work as expected. What can you say about an adapter cord? It is well made, good construction and sound from my DSLR with my mic is superb. LLM OUTPUT: Positive.
Conclusion¶
Congratulations on successfully completing the sentiment analysis task with the kluster.ai Batch API! This example demonstrates how simple it is to work with large datasets and derive meaningful insights from them. The Batch API enables you to scale your workflows seamlessly, making it a vital tool for handling large-scale data processing.