Sentiment analysis with kluster.ai API¶

Sentiment analysis is the process of reviewing text to determine whether there is positive, neutral, or negative connotation to the statement. LLMs can be extremely powerful, processing a lot of data quickly, helping understand the overall sentiment of a large dataset.

This tutorial runs through a notebook where you'll learn how to use the kluster.ai batch API to run a sentiment analysis on sample data.

The example uses an extract from the Amazon musical instrument reviews dataset to determine the sentiment of each review.

You can adapt this example by using your data and categories relevant to your use case. With this approach, you can effortlessly process datasets of any scale, big or small, and obtain categorized results powered by a state-of-the-art language model.

Prerequisites¶

Before getting started, ensure you have the following:

A kluster.ai account - sign up on the kluster.ai platform if you don't have one
A kluster.ai API key - after signing in, go to the API Keys section and create a new key. For detailed instructions, check out the Get an API key guide

Setup¶

In this notebook, we'll use Python's getpass module to input the key safely. After execution, please provide your unique kluster.ai API key (ensure no spaces).

In [1]:

Copied!

from getpass import getpass

api_key = getpass("Enter your kluster.ai API key: ")
from getpass import getpass

api_key = getpass("Enter your kluster.ai API key: ")

Enter your kluster.ai API key:  ········

Next, ensure you've installed OpenAI Python library:

In [ ]:

Copied!

%pip install -q openai
%pip install -q openai

Note: you may need to restart the kernel to use updated packages.

With the OpenAI Python library installed, we import the necessary dependencies for the tutorial:

In [3]:

Copied!





from openai import OpenAI

import pandas as pd
import time
import json
import os
from IPython.display import clear_output, display
from openai import OpenAI

import pandas as pd
import time
import json
import os
from IPython.display import clear_output, display

And then, initialize the client by pointing it to the kluster.ai endpoint, and passing your API key.

In [4]:

Copied!





# Set up the client
client = OpenAI(
    base_url="https://api.kluster.ai/v1",
    api_key=api_key,
)
# Set up the client
client = OpenAI(
    base_url="https://api.kluster.ai/v1",
    api_key=api_key,
)

Get the data¶

Now that you've initialized an OpenAI-compatible client pointing to kluster.ai, we can talk about the data.

This notebook includes a preloaded sample dataset sourced from Amazon's reviews of musical instruments. It contains customer feedback on various music-related products. No additional setup is needed. Proceed to the next steps to begin working with this data.

In [5]:

Copied!





df = pd.DataFrame({
    "text": [
        "It hums, crackles, and I think I'm having problems with my equipment. As soon as I use any of my other cords then the problem is gone. Hosa makes some other products that have good value. But based on my experience I don't recommend this one.",
        "I bought this to use with my keyboard. I wasn't really aware that there were other options for keyboard pedals. It doesn't work as smoothly as the pedals do on an acoustic piano, which is what I'd always used. Doesn't have the same feel either. Nowhere close.In my opinion, a sustain pedal like the M-Audio SP-2 Sustain Pedal with Piano Style Action or other similar pedal is a much better choice. The price difference is only a few dollars and the feel and action are so much better.",
        "This cable disproves the notion that you get what you pay for. It's quality outweighs its price. Let's face it, a cable is a cable is a cable. But the quality of these cables can vary greatly. I replaced a lighter cable with this one and I was surprised at the difference in the quality of the sound from my amp. I have an Ibanez ART series guitar into an Ibanez 15 watt amp set up in my home. With nothing changed but the cable, there was a significant difference in quality and volume. So much so that I checked with my guitar teacher who said he was not surprised. The quality appears good. The ends are heavy duty and the little bit of hum I had due to the proximity of everything was attenuated to the point where it was inconsequential. I've seen more expensive cables and this one is (so far) great.Hosa GTR210 Guitar Cable 10 Ft",
        "Bought this to hook up a Beta 58 to a Panasonic G2 DSLR and a Kodak Zi8 for interviews. Works the way it's supposed to. 90 degree TRS is a nice touch. Good price.",
        "96	Just received this cord and it seems to work as expected. What can you say about an adapter cord? It is well made, good construction and sound from my DSLR with my mic is superb."
    ]
})
df = pd.DataFrame({
    "text": [
        "It hums, crackles, and I think I'm having problems with my equipment. As soon as I use any of my other cords then the problem is gone. Hosa makes some other products that have good value. But based on my experience I don't recommend this one.",
        "I bought this to use with my keyboard. I wasn't really aware that there were other options for keyboard pedals. It doesn't work as smoothly as the pedals do on an acoustic piano, which is what I'd always used. Doesn't have the same feel either. Nowhere close.In my opinion, a sustain pedal like the M-Audio SP-2 Sustain Pedal with Piano Style Action or other similar pedal is a much better choice. The price difference is only a few dollars and the feel and action are so much better.",
        "This cable disproves the notion that you get what you pay for. It's quality outweighs its price. Let's face it, a cable is a cable is a cable. But the quality of these cables can vary greatly. I replaced a lighter cable with this one and I was surprised at the difference in the quality of the sound from my amp. I have an Ibanez ART series guitar into an Ibanez 15 watt amp set up in my home. With nothing changed but the cable, there was a significant difference in quality and volume. So much so that I checked with my guitar teacher who said he was not surprised. The quality appears good. The ends are heavy duty and the little bit of hum I had due to the proximity of everything was attenuated to the point where it was inconsequential. I've seen more expensive cables and this one is (so far) great.Hosa GTR210 Guitar Cable 10 Ft",
        "Bought this to hook up a Beta 58 to a Panasonic G2 DSLR and a Kodak Zi8 for interviews. Works the way it's supposed to. 90 degree TRS is a nice touch. Good price.",
        "96	Just received this cord and it seems to work as expected. What can you say about an adapter cord? It is well made, good construction and sound from my DSLR with my mic is superb."
    ]
})

Perform batch inference¶

To execute the batch inference job, we'll take the following steps:

Create the batch job file - we'll generate a JSON lines file with the desired requests to be processed by the model
Upload the batch job file - once it is ready, we'll upload it to the kluster.ai platform using the API, where it will be processed. We'll receive a unique ID associated with our file
Start the batch job - after the file is uploaded, we'll initiate the job to process the uploaded data, using the file ID obtained before
Monitor job progress - (optional) track the status of the batch job to ensure it has been successfully completed
Retrieve results - once the job has completed execution, we can access and process the resultant data

This notebook is prepared for you to follow along. Run the cells below to watch it all come together.

Create the batch input file¶

This example selects the klusterai/Meta-Llama-3.3-70B-Instruct-Turbo model. If you'd like to use a different model, feel free to change it by modifying the model field. In this notebook, you can also comment Llama 3.3 70B, and uncomment whatever model you want to try out.

Please refer to the Supported models section for a list of the models we support.

The following snippets prepare the JSONL file, where each line represents a different request. Note that each separate batch request can have its own model. Also, we are using a temperature of 0.5 but feel free to change it and play around with the different outcomes.

In [6]:

Copied!





# Prompt
SYSTEM_PROMPT = '''
    Analyze the sentiment of this text and respond with one word: positive, negative, or neutral.
    '''

# Model
model="klusterai/Meta-Llama-3.3-70B-Instruct-Turbo"

# Ensure the directory exists
os.makedirs("sentiment_analysis", exist_ok=True)

# Create the batch job file with the prompt and content
def create_batch_file(df):
    batch_list = []
    for index, row in df.iterrows():
        content = row['text']

        request = {
            "custom_id": f"sentiment-analysis-{index}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": model,
                "temperature": 0.5,
                "messages": [
                    {"role": "system", "content": SYSTEM_PROMPT},
                    {"role": "user", "content": content}
                ],
            }
        }
        batch_list.append(request)
    return batch_list

# Save file
def save_batch_file(batch_list):
    filename = f"sentiment_analysis/batch_job_request.jsonl"
    with open(filename, 'w') as file:
        for request in batch_list:
            file.write(json.dumps(request) + '\n')
    return filename
# Prompt
SYSTEM_PROMPT = '''
    Analyze the sentiment of this text and respond with one word: positive, negative, or neutral.
    '''

# Model
model="klusterai/Meta-Llama-3.3-70B-Instruct-Turbo"

# Ensure the directory exists
os.makedirs("sentiment_analysis", exist_ok=True)

# Create the batch job file with the prompt and content
def create_batch_file(df):
    batch_list = []
    for index, row in df.iterrows():
        content = row['text']

        request = {
            "custom_id": f"sentiment-analysis-{index}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": model,
                "temperature": 0.5,
                "messages": [
                    {"role": "system", "content": SYSTEM_PROMPT},
                    {"role": "user", "content": content}
                ],
            }
        }
        batch_list.append(request)
    return batch_list

# Save file
def save_batch_file(batch_list):
    filename = f"sentiment_analysis/batch_job_request.jsonl"
    with open(filename, 'w') as file:
        for request in batch_list:
            file.write(json.dumps(request) + '\n')
    return filename

Let's run the functions we've defined before:

In [7]:

Copied!

batch_list = create_batch_file(df)
data_dir = save_batch_file(batch_list)
print(data_dir)
batch_list = create_batch_file(df)
data_dir = save_batch_file(batch_list)
print(data_dir)

sentiment_analysis/batch_job_request.jsonl

Next, we can preview what that batch job file looks like:

In [8]:

Copied!

!head -n 1 sentiment_analysis/batch_job_request.jsonl
!head -n 1 sentiment_analysis/batch_job_request.jsonl

{"custom_id": "sentiment-analysis-0", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "klusterai/Meta-Llama-3.3-70B-Instruct-Turbo", "temperature": 0.5, "messages": [{"role": "system", "content": "\n    Analyze the sentiment of this text and respond with one word: positive, negative, or neutral.\n    "}, {"role": "user", "content": "It hums, crackles, and I think I'm having problems with my equipment. As soon as I use any of my other cords then the problem is gone. Hosa makes some other products that have good value. But based on my experience I don't recommend this one."}]}}

Upload inference file to kluster.ai¶

Now that we've prepared our input file, it's time to upload it to the kluster.ai platform. To do so, you can use the files.create endpoint of the client, where the purpose is set to batch. This will return the file ID, which we need to log for the next steps.

In [9]:

Copied!





# Upload batch job request file
with open(data_dir, 'rb') as file:
    upload_response = client.files.create(
        file=file,
        purpose="batch"
    )

    # Print job ID
    file_id = upload_response.id
    print(f"File uploaded successfully. File ID: {file_id}")
# Upload batch job request file
with open(data_dir, 'rb') as file:
    upload_response = client.files.create(
        file=file,
        purpose="batch"
    )

    # Print job ID
    file_id = upload_response.id
    print(f"File uploaded successfully. File ID: {file_id}")

File uploaded successfully. File ID: 67e57e7933090e20560503db

Start the job¶

Once the file has been successfully uploaded, we're ready to start (create) the batch job by providing the file ID we got in the previous step. To do so, we use the batches.create method, for which we need to set the endpoint to /v1/chat/completions. This will return the batch job details, with the ID.

In [10]:

Copied!





# Create batch job with completions endpoint
batch_job = client.batches.create(
    input_file_id=file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print("\nBatch job created:")
batch_dict = batch_job.model_dump()
print(json.dumps(batch_dict, indent=2))
# Create batch job with completions endpoint
batch_job = client.batches.create(
    input_file_id=file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print("\nBatch job created:")
batch_dict = batch_job.model_dump()
print(json.dumps(batch_dict, indent=2))

Batch job created:
{
  "id": "67e57e7d33090e20560504e2",
  "completion_window": "24h",
  "created_at": 1743093373,
  "endpoint": "/v1/chat/completions",
  "input_file_id": "67e57e7933090e20560503db",
  "object": "batch",
  "status": "pre_schedule",
  "cancelled_at": null,
  "cancelling_at": null,
  "completed_at": null,
  "error_file_id": null,
  "errors": [],
  "expired_at": null,
  "expires_at": 1743179773,
  "failed_at": null,
  "finalizing_at": null,
  "in_progress_at": null,
  "metadata": {},
  "output_file_id": null,
  "request_counts": {
    "completed": 0,
    "failed": 0,
    "total": 0
  }
}

All requests are currently being processed.

Check job progress¶

Now that your batch job has been created, you can track its progress.

To monitor the job's progress, you can use the batches.retrieve method and pass the batch job ID. The response contains an status field that tells us if it is completed or not, and the subsequent status of each job separately.

The following snippet checks the status every 10 seconds until the entire batch is completed:

In [11]:

Copied!





all_completed = False

# Loop to check status every 10 seconds
while not all_completed:
    all_completed = True
    output_lines = []

    updated_job = client.batches.retrieve(batch_job.id)

    if updated_job.status != "completed":
        all_completed = False
        completed = updated_job.request_counts.completed
        total = updated_job.request_counts.total
        output_lines.append(f"Job status: {updated_job.status} - Progress: {completed}/{total}")
    else:
        output_lines.append(f"Job completed!")

    # Clear the output and display updated status
    clear_output(wait=True)
    for line in output_lines:
        display(line)

    if not all_completed:
        time.sleep(10)
all_completed = False

# Loop to check status every 10 seconds
while not all_completed:
    all_completed = True
    output_lines = []

    updated_job = client.batches.retrieve(batch_job.id)

    if updated_job.status != "completed":
        all_completed = False
        completed = updated_job.request_counts.completed
        total = updated_job.request_counts.total
        output_lines.append(f"Job status: {updated_job.status} - Progress: {completed}/{total}")
    else:
        output_lines.append(f"Job completed!")

    # Clear the output and display updated status
    clear_output(wait=True)
    for line in output_lines:
        display(line)

    if not all_completed:
        time.sleep(10)

'Job completed!'

Get the results¶

With the job completed, we'll retrieve the results and review the responses generated for each request. We then parse these results. To fetch them from the platform, retrieve the output_file_id from the batch job, then use the files.content endpoint with that file ID. Note that the job status must be completed before you can retrieve the results!

In [12]:

Copied!





#Parse results as a JSON object
def parse_json_objects(data_string):
    if isinstance(data_string, bytes):
        data_string = data_string.decode('utf-8')

    json_strings = data_string.strip().split('\n')
    json_objects = []

    for json_str in json_strings:
        try:
            json_obj = json.loads(json_str)
            json_objects.append(json_obj)
        except json.JSONDecodeError as e:
            print(f"Error parsing JSON: {e}")

    return json_objects

# Retrieve results with job ID
job = client.batches.retrieve(batch_job.id)
result_file_id = job.output_file_id
result = client.files.content(result_file_id).content

# Parse JSON results
parsed_result = parse_json_objects(result)

# Extract and print only the content of each response
print("\nExtracted Responses:")
for item in parsed_result:
    try:
        content = item["response"]["body"]["choices"][0]["message"]["content"]
        print(content)
    except KeyError as e:
        print(f"Missing key in response: {e}")
#Parse results as a JSON object
def parse_json_objects(data_string):
    if isinstance(data_string, bytes):
        data_string = data_string.decode('utf-8')

    json_strings = data_string.strip().split('\n')
    json_objects = []

    for json_str in json_strings:
        try:
            json_obj = json.loads(json_str)
            json_objects.append(json_obj)
        except json.JSONDecodeError as e:
            print(f"Error parsing JSON: {e}")

    return json_objects

# Retrieve results with job ID
job = client.batches.retrieve(batch_job.id)
result_file_id = job.output_file_id
result = client.files.content(result_file_id).content

# Parse JSON results
parsed_result = parse_json_objects(result)

# Extract and print only the content of each response
print("\nExtracted Responses:")
for item in parsed_result:
    try:
        content = item["response"]["body"]["choices"][0]["message"]["content"]
        print(content)
    except KeyError as e:
        print(f"Missing key in response: {e}")

Extracted Responses:
Negative.
Negative.
Positive.
Positive.
Positive.

Summary¶

This tutorial used the chat completion endpoint to perform a simple sentiment analysis task with batch inference. This particular example classified a series of reviews to understand if they had a positive, neutral or negative note.

To submit a batch job, we've:

Created the JSONL file, where each line of the file represented a separate request
Submitted the file to the platform
Started the batch job, and monitored its progress
Once completed, we fetched the results

All of this using the OpenAI Python library and API, no changes needed!

Kluster.ai's batch API empowers you to scale your workflows seamlessly, making it an invaluable tool for processing extensive datasets. As next steps, feel free to create your own dataset, or expand on top of this existing example. Good luck!