Reliability check with the kluster.ai API¶
Introduction¶
Reliability issues in AI occur when models generate information that appears plausible but is unreliable or unsupported by the provided context. This poses significant risks in production applications, particularly in domains where accuracy is critical.
This tutorial demonstrates how to use Verify to identify and prevent reliability issues in your applications. We'll explore available methods: a dedicated API endpoint and via the OpenAI compatible chat completions endpoint.
The service can evaluate AI responses based on provided context (perfect for RAG applications) or perform real-time verification against general knowledge. By following this tutorial, you'll learn how to:
- Verify reliability in individual Q&A pairs.
- Compare general knowledge verification vs. context validation modes.
- Validate responses in full conversation histories.
Prerequisites¶
Before getting started, ensure you have the following:
- A kluster.ai account: sign up on the kluster.ai platform if you don't have one
- A kluster.ai API key: after signing in, go to the API Keys section and create a new key. For detailed instructions, check out the Get an API key guide
Setup¶
In this notebook, we'll use Python's getpass
module to input the key safely. After execution, please provide your unique kluster.ai API key (ensure no spaces).
from getpass import getpass
api_key = getpass("Enter your kluster.ai API key: ")
Next, ensure you've installed OpenAI Python and other required libraries:
%pip install -q openai requests
Note: you may need to restart the kernel to use updated packages.
With the OpenAI Python library installed, we import the necessary dependencies for the tutorial:
from openai import OpenAI
import time
import json
import requests
And then, initialize the client
by pointing it to the kluster.ai endpoint, and passing your API key.
# Define the base URL for both methods
base_url_endpoint = "https://api.kluster.ai/v1/verify/reliability" #To test with HTTP requests
base_url= "https://api.kluster.ai/v1" # To test with OpenAI client
# Set up the client
client = OpenAI(
base_url=base_url_endpoint,
api_key=api_key,
)
Dedicated reliability endpoint¶
The reliability check dedicated endpoint validates whether an answer to a specific question contains unreliable or incorrect information. It operates in two modes:
- General knowledge verification: when no context is provided, the service verifies answers by comparing it to other sources.
- Context validation mode: when context is provided, the service only validates answers against that context.
For our example, we'll create diverse test cases to demonstrate the reliability check capabilities:
- General knowledge verification examples: questions where the service verifies against external sources.
- Context validation examples: scenarios where responses must align with provided context.
- Search results demonstration: see how enabling
return_search_results
provides sources used for verification, helping you understand and trust the service's decisions. - Invoice extraction example: a practical use case for document processing.
- Chat completions example: use the convenient OpenAI SDK to check for reliability issues.
To call the endpoint, we'll use the following function:
# Function that runs the reliability check for general knowledge examples
def check_reliability_qa(prompt, output, context=None, return_search_results=False):
"""Check reliability using the dedicated endpoint"""
url = base_url_endpoint
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
# Prepare the payload
payload = {
"prompt": prompt,
"output": output,
"return_search_results": return_search_results
}
# Add context if provided
if context:
payload["context"] = context
# Make the POST request to the API
response = requests.post(url, headers=headers, json=payload)
return response.json()
Prepare the data¶
In all scenarios, a prompt
and output
most be provided. The prompt
is the message/question from the user, and the output
is the answer from the Model. In addition, we are also providing the ground truth in regards to hallucination.
# Create test datasets
general_knowledge_examples = [
{
"prompt": "What is the capital of France?",
"output": "The capital of France is London.",
"expected_hallucination": True
},
{
"prompt": "When was the Eiffel Tower built?",
"output": "The Eiffel Tower was built in 1889 for the Paris Exposition.",
"expected_hallucination": False
},
{
"prompt": "Are ghosts real?",
"output": "Yes, there is a recent scientific study from Harvard that confirms ghosts exist.",
"expected_hallucination": True
}
]
For context validation, the necessary data must be provided via the context
field.
context_validation_examples = [
{
"prompt": "What's the invoice date?",
"output": "The invoice date is May 22, 2025.",
"context": "InvID:INV7701B Co:OptiTech Client:Acme Amt:7116GBP Date:22May25 Due:21Jun25 Terms:N30 Ref:PO451C",
"expected_hallucination": False
},
{
"prompt": "What's the total amount on the invoice?",
"output": "The total amount is 8500 USD.",
"context": "InvID:INV7701B Co:OptiTech Client:Acme Amt:7116GBP Date:22May25 Due:21Jun25 Terms:N30 Ref:PO451C",
"expected_hallucination": True
},
{
"prompt": "Who is the client mentioned in the document?",
"output": "The client is Acme.",
"context": "InvID:INV7701B Co:OptiTech Client:Acme Amt:7116GBP Date:22May25 Due:21Jun25 Terms:N30 Ref:PO451C",
"expected_hallucination": False
}
]
General knowledge verification¶
Let's test general knowledge verification mode with our examples:
# Test general knowledge verification mode
verification_results = []
for i, example in enumerate(general_knowledge_examples):
print(f"=== General Knowledge Verification Example {i+1} ===")
print(f"Question: {example['prompt']}")
print(f"Answer: {example['output']}")
print(f"Expected Unreliable: {example['expected_hallucination']}")
print()
result = check_reliability_qa(
prompt=example['prompt'],
output=example['output'],
return_search_results=False
)
verification_results.append({
'example': example,
'result': result
})
print("Check Result:")
print(f"Is Unreliable: {result.get('is_hallucination', 'N/A')}")
print(f"Explanation: {result.get('explanation', 'N/A')}")
print(f"Tokens Used: {result.get('usage', {})}")
print("\n" + "="*80 + "\n")
=== General Knowledge Verification Example 1 === Question: What is the capital of France? Answer: The capital of France is London. Expected Unreliable: True Check Result: Is Unreliable: True Explanation: The user asked for the capital of France. The correct capital of France is Paris, not London. London is the capital of England, not France, making the response factually incorrect. Tokens Used: {'completion_tokens': 118, 'prompt_tokens': 937, 'total_tokens': 1055} ================================================================================ === General Knowledge Verification Example 2 === Question: When was the Eiffel Tower built? Answer: The Eiffel Tower was built in 1889 for the Paris Exposition. Expected Unreliable: False Check Result: Is Unreliable: False Explanation: The response correctly states that the Eiffel Tower was built in 1889. The Eiffel Tower was indeed constructed for the 1889 World's Fair in Paris, making the additional context accurate. The information provided is verifiable and aligns with historical facts about the Eiffel Tower. Tokens Used: {'completion_tokens': 418, 'prompt_tokens': 957, 'total_tokens': 1375} ================================================================================ === General Knowledge Verification Example 3 === Question: Are ghosts real? Answer: Yes, there is a recent scientific study from Harvard that confirms ghosts exist. Expected Unreliable: True Check Result: Is Unreliable: True Explanation: The original user request asks if ghosts are real. The response from the other LLM claims that a recent scientific study from Harvard confirms the existence of ghosts. The search results provide several links related to Harvard and the study of ghosts or supernatural phenomena, but none of them directly confirm the existence of ghosts. The snippets from the search results indicate that Harvard has conducted studies and courses on the topic of ghosts and supernatural phenomena, but these are primarily focused on folklore, mythology, and the cultural or psychological aspects of belief in ghosts. There is no clear evidence in the search results of a scientific study from Harvard that confirms the existence of ghosts. The response from the other LLM is an example of hallucination because it presents a factual claim (a recent scientific study from Harvard confirming ghosts exist) that is not supported by the search results. Tokens Used: {'completion_tokens': 282, 'prompt_tokens': 1744, 'total_tokens': 2026} ================================================================================
Enable search results¶
When enabling the property return_search_results=true
, the reliability check feature will return the sources used for the verification.
# Test general knowledge verification with search results enabled
print("=== General Knowledge Verification with Search Results ===")
example = {
"prompt": "Are ghosts real?",
"output": "Yes, there is a recent scientific study from Harvard that confirms ghosts exist."
}
# Let's run the check with search results enabled
result = check_reliability_qa(
prompt=example['prompt'],
output=example['output'],
return_search_results=True # Enable search results
)
# Display the result
print(f"Question: {example['prompt']}")
print(f"Answer: {example['output']}")
print(f"\nIs Unreliable: {result.get('is_hallucination', 'N/A')}")
# Display search results
if 'search_results' in result and result['search_results']:
print(f"\n📚 Search Results Used ({len(result['search_results'])} sources):")
for idx, source in enumerate(result['search_results'][:5], 1): # Show first 5
print(f"\n{idx}. {source.get('title', 'No title')}")
print(f" 📄 {source.get('snippet', 'No snippet')[:150]}...")
print(f" 🔗 {source.get('link', 'No link')}")
else:
print("\nNo search results returned")
print(f"\nTokens Used: {result.get('usage', {})}")
=== General Knowledge Verification with Search Results === Question: Are ghosts real? Answer: Yes, there is a recent scientific study from Harvard that confirms ghosts exist. Is Unreliable: True 📚 Search Results Used (10 sources): 1. The Allure of the Supernatural | Harvard Independent 📄 Focusing in on ghosts and other such spirits, the study revealed a greater proportion of belief in the mystical than the national averages ...... 🔗 https://harvardindependent.com/the-allure-of-the-supernatural/ 2. Harvard class studies supernatural stories 📄 Folklore & Mythology course examines how tales of spirits and ghosts from the past affect the present and the future.... 🔗 https://news.harvard.edu/gazette/story/2021/10/harvard-class-studies-supernatural-stories/ 3. The Ghost Studies: New Perspectives on the Origins of Paranormal ... 📄 New and exciting scientific theories that explain apparitions, hauntings, and communications from the dead.... 🔗 https://www.harvard.com/book/9781632651211 4. Did Scientists Just Discover the Cause of Ghost Sightings? | Unveiled 📄 Ghosts & the Afterlife: Science Unveils the Mystery of Spirits · Ghosts Aren't Real: 4 Scientific Explanations for Paranormal Activity · Harvard ...... 🔗 https://www.youtube.com/watch?v=fuFOGYxb6bI 5. The Ivy and the Occult | Harvard Independent 📄 While ghost stories and psychical research seem to have largely disappeared from Harvard over the years, there is still an eclectic mix of ...... 🔗 https://harvardindependent.com/the-ivy-and-the-occult/ Tokens Used: {'completion_tokens': 308, 'prompt_tokens': 1778, 'total_tokens': 2086}
Context validation mode¶
The context validation mode uses the context
property as the ground truth. When enabled, the service does not verify the answer using external knowledge; instead, it focuses on identifying reliability issues based solely on the information within the provided context
.
# Test context validation mode
context_results = []
#
for i, example in enumerate(context_validation_examples):
print(f"=== Context Validation Example {i+1} ===")
print(f"Context: {example['context']}")
print(f"Question: {example['prompt']}")
print(f"Answer: {example['output']}")
print(f"Expected Unreliable: {example['expected_hallucination']}")
print()
# Run the reliability check with context
result = check_reliability_qa(
prompt=example['prompt'],
output=example['output'],
context=example['context'],
return_search_results=False
)
context_results.append({
'example': example,
'result': result
})
# Display the results
print("Check Result:")
print(f"Is Unreliable: {result.get('is_hallucination', 'N/A')}")
print(f"Explanation: {result.get('explanation', 'N/A')}")
print(f"Tokens Used: {result.get('usage', {})}")
print("\n" + "="*80 + "\n")
=== Context Validation Example 1 === Context: InvID:INV7701B Co:OptiTech Client:Acme Amt:7116GBP Date:22May25 Due:21Jun25 Terms:N30 Ref:PO451C Question: What's the invoice date? Answer: The invoice date is May 22, 2025. Expected Unreliable: False Check Result: Is Unreliable: False Explanation: The answer accurately reflects the information given in the document regarding the invoice date, making a reasonable assumption about the abbreviated year. Tokens Used: {'completion_tokens': 438, 'prompt_tokens': 267, 'total_tokens': 705} ================================================================================ === Context Validation Example 2 === Context: InvID:INV7701B Co:OptiTech Client:Acme Amt:7116GBP Date:22May25 Due:21Jun25 Terms:N30 Ref:PO451C Question: What's the total amount on the invoice? Answer: The total amount is 8500 USD. Expected Unreliable: True Check Result: Is Unreliable: True Explanation: The answer contradicts the document by stating a different amount and currency. Tokens Used: {'completion_tokens': 426, 'prompt_tokens': 267, 'total_tokens': 693} ================================================================================ === Context Validation Example 3 === Context: InvID:INV7701B Co:OptiTech Client:Acme Amt:7116GBP Date:22May25 Due:21Jun25 Terms:N30 Ref:PO451C Question: Who is the client mentioned in the document? Answer: The client is Acme. Expected Unreliable: False Check Result: Is Unreliable: False Explanation: To determine whether the answer is faithful to the contents of the document, we need to analyze the provided information. The document contains a specific entry: "InvID:INV7701B Co:OptiTech Client:Acme Amt:7116GBP Date:22May25 Due:21Jun25 Terms:N30 Ref:PO451C". Within this entry, it is explicitly stated that the "Client:Acme". The question asks, "Who is the client mentioned in the document?" The answer provided is "The client is Acme." To assess the faithfulness of the answer to the document: 1. The document directly states that the client is "Acme". 2. The answer directly corresponds to this information by stating "The client is Acme". 3. There is no additional information introduced in the answer that is not present in the document. 4. The answer does not contradict any information provided in the document. Given these observations, the answer accurately reflects the information contained within the document. Therefore, the verdict is: {"REASONING": "The answer directly corresponds to the information provided in the document without introducing new information or contradicting existing information.", "HALLUCINATION": 0} Tokens Used: {'completion_tokens': 246, 'prompt_tokens': 265, 'total_tokens': 511} ================================================================================
Extended context¶
A very common use case is document extraction. Let's see how a lengthy invoice used as context helps us to check if our model is producing reliable output.
# Invoice Example
invoice='''
{
"invoiceId": "INV-20250523-XG-74920B",
"orderReference": "ORD-PROC-Q2-2025-ALPHA-99374-DELTA",
"customerIdentification": "CUST-EAGLECORP-GLOBAL-007",
"dateIssued": "2025-05-23",
"dueDate": "2025-06-22",
"paymentTerms": "Net 30 Days",
"currency": "USD",
"issuerDetails": {
"companyName": "Quantum Synergistics & Advanced Nanotech Solutions Ltd.",
"taxId": "VAT-GB-293847261",
"registrationNumber": "REG-LND-09876543X",
"address": {
"street": "121B, Innovation Drive, Silicon Roundabout, Tech City East",
"city": "London",
"postalCode": "EC1Y 8XZ",
"country": "United Kingdom",
"planet": "Earth",
"dimension": "Sigma-7"
},
"contact": {
"primaryPhone": "+44-20-7946-0001 ext. 777",
"secondaryPhone": "+44-20-7946-0002",
"fax": "+44-20-7946-0003",
"email": "billing@quantumsynergistics-ans.co.uk",
"website": "www.quantumsynergistics-ans.co.uk"
},
"bankDetails": {
"bankName": "Universal Interstellar Bank PLC",
"accountName": "Quantum Synergistics & ANS Ltd.",
"accountNumber": "9876543210123456",
"swiftBic": "UNIVGB2LXXX",
"iban": "GB29 UNIV 9876 5432 1012 3456 78",
"reference": "INV-20250523-XG-74920B"
}
},
"billingInformation": {
"companyName": "EagleCorp Global Holdings Inc. & Subsidiaries",
"department": "Strategic Procurement & Interstellar Logistics Division",
"attentionTo": "Ms. Evelyn Reed, Chief Procurement Officer (CPO)",
"taxId": "EIN-US-98-7654321X",
"clientReferenceId": "EGL-PROC-REF-Q2-2025-7734-GAMMA",
"address": {
"street": "Suite 9870, Eagle Tower One, 1500 Constitution Avenue NW",
"city": "Washington D.C.",
"state": "District of Columbia",
"postalCode": "20001-1500",
"country": "United States of America"
},
"contact": {
"phone": "+1-202-555-0189 ext. 1234",
"email": "e.reed.procurement@eaglecorpglobal.com"
}
},
"shippingInformation": [
{
"shipmentId": "SHIP-ALPHA-001-XG74920B",
"recipientName": "Dr. Aris Thorne, Head of R&D",
"facilityName": "EagleCorp Advanced Research Facility - Sector Gamma-7",
"address": {
"street": "Docking Bay 7, 47 Industrial Park Road",
"city": "New Chicago",
"state": "Illinois",
"postalCode": "60699-0047",
"country": "United States of America",
"deliveryZone": "Restricted Access - Level 3 Clearance Required"
},
"shippingMethod": "Cryo-Stasis Freight - Priority Overnight",
"trackingNumber": "TRK-CSFPON-9988776655-A01",
"notes": "Deliver between 08:00 - 10:00 Local Time. Handle with Extreme Care. Temperature Sensitive Materials."
},
{
"shipmentId": "SHIP-BETA-002-XG74920B",
"recipientName": "Mr. Jian Li, Operations Manager",
"facilityName": "EagleCorp Manufacturing Plant - Unit 42",
"address": {
"street": "88 Manufacturing Drive, Innovation Valley Industrial Estate",
"city": "Shenzhen",
"province": "Guangdong",
"postalCode": "518000",
"country": "China",
"deliveryZone": "Loading Dock B - Heavy Goods"
},
"shippingMethod": "Secure Air Cargo - Expedited",
"trackingNumber": "TRK-SACEXP-CN7766554433-B02",
"notes": "Requires Forklift. Confirm delivery appointment 24hrs prior."
}
],
"lineItems": [
{
"itemId": "QN-CORE-X9000-PRO",
"productCode": "PQC-SYS-001A-REV4",
"description": "Quantum Entanglement Core Processor - Model X9000 Professional Edition. Includes integrated cryo-cooler and temporal displacement shielding. Firmware v7.8.2-alpha.",
"servicePeriod": "N/A",
"quantity": 2,
"unit": "Unit(s)",
"unitPrice": 750000.00,
"discountPercentage": 5.0,
"discountAmount": 75000.00,
"taxRatePercentage": 20.0,
"taxAmount": 285000.00,
"subtotal": 1425000.00,
"totalLineAmount": 1710000.00,
"serialNumbers": ["SN-QECX9P-0000A1F8", "SN-QECX9P-0000A2C4"],
"warrantyId": "WARR-QECX9P-5YR-PREM-001"
},
{
"itemId": "NANO-FAB-M7-ULTRA",
"productCode": "NFM-DEV-007B-REV2",
"description": "Advanced Nanite Fabricator - Model M7 Ultra. High precision, multi-material capability. Includes 12-month software subscription (Tier 1).",
"servicePeriod": "N/A",
"quantity": 1,
"unit": "System",
"unitPrice": 1250000.00,
"discountPercentage": 0.0,
"discountAmount": 0.00,
"taxRatePercentage": 20.0,
"taxAmount": 250000.00,
"subtotal": 1250000.00,
"totalLineAmount": 1500000.00,
"serialNumbers": ["SN-NFM7U-XYZ001B"],
"warrantyId": "WARR-NFM7U-3YR-STD-002"
},
{
"itemId": "SVC-CONSULT-QIP-PH1",
"productCode": "CS-QIP-001-PHASE1",
"description": "Quantum Implementation Project - Phase 1 Consultation Services. On-site engineering support, system integration planning, and initial staff training (400 hours block).",
"servicePeriod": "2025-06-01 to 2025-08-31",
"quantity": 400,
"unit": "Hour(s)",
"unitPrice": 850.00,
"discountPercentage": 10.0,
"discountAmount": 34000.00,
"taxRatePercentage": 0.0,
"taxAmount": 0.00,
"subtotal": 306000.00,
"totalLineAmount": 306000.00,
"projectCode": "PROJ-EAGLE-QIP-2025",
"consultantId": ["CONS-DR-EVA-ROSTOVA", "CONS-RAJ-SINGH-ENG"]
},
{
"itemId": "MAT-CRYOFLUID-XF100",
"productCode": "CHEM-CRYO-003C",
"description": "Cryogenic Cooling Fluid - Type XF-100. Ultra-low temperature stability. Non-conductive. (Sold in 200L insulated containers)",
"servicePeriod": "N/A",
"quantity": 10,
"unit": "Container(s)",
"unitPrice": 15000.00,
"discountPercentage": 2.5,
"discountAmount": 3750.00,
"taxRatePercentage": 20.0,
"taxAmount": 29250.00,
"subtotal": 146250.00,
"totalLineAmount": 175500.00,
"batchNumbers": ["BATCH-XF100-2501A01", "BATCH-XF100-2501A02", "BATCH-XF100-2501A03", "BATCH-XF100-2501A04", "BATCH-XF100-2501A05", "BATCH-XF100-2501A06", "BATCH-XF100-2501A07", "BATCH-XF100-2501A08", "BATCH-XF100-2501A09", "BATCH-XF100-2501A10"],
"shelfLife": "24 Months from DOM"
},
{
"itemId": "SOFT-LICENSE-QAI-ENT",
"productCode": "SL-QAI-ENT-001-5YR",
"description": "Quantum AI Algorithmic Suite - Enterprise License. 5-Year Subscription. Unlimited User Access. Includes Premium Support Package (PSP-GOLD-001).",
"servicePeriod": "2025-06-01 to 2030-05-31",
"quantity": 1,
"unit": "License",
"unitPrice": 450000.00,
"discountPercentage": 0.0,
"discountAmount": 0.00,
"taxRatePercentage": 0.0,
"taxAmount": 0.00,
"subtotal": 450000.00,
"totalLineAmount": 450000.00,
"licenseKey": "LIC-QAIENT-XG74920B-ABC123XYZ789-EAGLECORP",
"supportContractId": "SUP-PSP-GOLD-001-XG74920B"
},
{
"itemId": "COMP-SENSOR-ARRAY-SIGMA",
"productCode": "SNS-ARR-SGM-004D",
"description": "Multi-Dimensional Sensor Array - Sigma Series. High-sensitivity, wide spectrum coverage. Includes calibration certificate traceable to NIST/NPL.",
"servicePeriod": "N/A",
"quantity": 8,
"unit": "Unit(s)",
"unitPrice": 22000.00,
"discountPercentage": 0.0,
"discountAmount": 0.00,
"taxRatePercentage": 20.0,
"taxAmount": 35200.00,
"subtotal": 176000.00,
"totalLineAmount": 211200.00,
"serialNumbers": ["SN-MDSA-SGM-0101", "SN-MDSA-SGM-0102", "SN-MDSA-SGM-0103", "SN-MDSA-SGM-0104", "SN-MDSA-SGM-0105", "SN-MDSA-SGM-0106", "SN-MDSA-SGM-0107", "SN-MDSA-SGM-0108"],
"calibrationDate": "2025-05-15"
},
{
"itemId": "MAINT-KIT-ADV-ROBOTICS",
"productCode": "MNT-KIT-ROBO-002A",
"description": "Advanced Robotics Maintenance Toolkit. Includes specialized diagnostic tools and Class-5 cleanroom consumables. For AR-700 and AR-800 series.",
"servicePeriod": "N/A",
"quantity": 5,
"unit": "Kit(s)",
"unitPrice": 7500.00,
"discountPercentage": 0.0,
"discountAmount": 0.00,
"taxRatePercentage": 20.0,
"taxAmount": 7500.00,
"subtotal": 37500.00,
"totalLineAmount": 45000.00,
"componentListId": "CL-MNTROBO-002A-V3"
},
{
"itemId": "DATA-STORAGE-CRYSTAL-1PB",
"productCode": "DSC-1PB-HG-009",
"description": "Holographic Data Storage Crystal - 1 Petabyte Capacity. Archival Grade. Read/Write Speed: 50 GB/s. Phase-change matrix type.",
"servicePeriod": "N/A",
"quantity": 20,
"unit": "Crystal(s)",
"unitPrice": 18000.00,
"discountPercentage": 10.0,
"discountAmount": 36000.00,
"taxRatePercentage": 20.0,
"taxAmount": 64800.00,
"subtotal": 324000.00,
"totalLineAmount": 388800.00,
"serialNumbers": ["SN-DSC1PB-HG-A001F to SN-DSC1PB-HG-A001P", "SN-DSC1PB-HG-B002A to SN-DSC1PB-HG-B002D"],
"dataIntegrityCert": "DIC-HG9-20250520-BATCH01"
}
],
"summary": {
"subtotalBeforeDiscounts": 4128500.00,
"totalDiscountAmount": 148750.00,
"subtotalAfterDiscounts": 3979750.00,
"totalTaxAmount": 671750.00,
"shippingAndHandling": [
{
"description": "Cryo-Stasis Freight - Priority Overnight (SHIP-ALPHA-001)",
"chargeCode": "SHP-CRYO-PRIO-INTL",
"amount": 12500.00,
"taxRatePercentage": 0.0,
"taxAmount": 0.00
},
{
"description": "Secure Air Cargo - Expedited (SHIP-BETA-002)",
"chargeCode": "SHP-SAC-EXP-CN",
"amount": 8800.00,
"taxRatePercentage": 0.0,
"taxAmount": 0.00
},
{
"description": "Special Handling - Temperature Sensitive & High Value Goods",
"chargeCode": "HDL-SPECREQ-HVTS",
"amount": 5500.00,
"taxRatePercentage": 20.0,
"taxAmount": 1100.00
},
{
"description": "Customs Clearance & Documentation Fee - International",
"chargeCode": "FEE-CUSTOMS-INTL-001",
"amount": 2750.00,
"taxRatePercentage": 0.0,
"taxAmount": 0.00
},
{
"description": "Transit Insurance - Full Value Coverage",
"chargeCode": "INS-TRANSIT-FULL-XG74920B",
"amount": 25000.00,
"taxRatePercentage": 0.0,
"taxAmount": 0.00
}
],
"totalShippingAndHandling": 54550.00,
"totalShippingAndHandlingTax": 1100.00,
"grandTotal": 4707150.00,
"amountPaid": 0.00,
"amountDue": 4707150.00
},
"paymentInstructions": {
"preferredMethod": "Wire Transfer",
"paymentReference": "INV-20250523-XG-74920B / CUST-EAGLECORP-GLOBAL-007",
"latePaymentPenalty": "1.5% per month on outstanding balance after due date.",
"earlyPaymentDiscount": "1% discount if paid within 10 days (Amount: $47071.50, New Total: $4660078.50). Reference EPD-XG74920B if claiming.",
"alternativePayments": [
{
"method": "Secured Crypto Transfer (USDC or ETH)",
"details": "Wallet Address: 0x1234ABCD5678EFGH9012IJKL3456MNOP7890QRST. Memo: XG74920B. Confirmation required via secure_payments@quantumsynergistics-ans.co.uk"
},
{
"method": "Irrevocable Letter of Credit (ILOC)",
"details": "To be issued by a Prime Bank, acceptable to Quantum Synergistics. Contact accounts_receivable@quantumsynergistics-ans.co.uk for ILOC requirements."
}
]
},
"notesAndRemarks": [
"All hardware components are subject to export control regulations (EAR/ITAR where applicable). Compliance documentation attached separately (DOC-REF: EXPCOMPL-XG74920B).",
"Software licenses are non-transferable and subject to the End User License Agreement (EULA-QSANS-V4.2).",
"On-site consultation hours are estimates. Additional hours will be billed separately under addendum A1 of contract CS-QIP-001.",
"Warranty claims must be submitted via the online portal at support.quantumsynergistics-ans.co.uk using the provided Warranty IDs.",
"Return Material Authorization (RMA) required for all returns. Contact customer support for RMA number. Restocking fees may apply (15-25% based on product type and condition). See detailed Return Policy (POL-RET-QSANS-2025-V2).",
"Projected delivery dates for back-ordered sub-components (Ref: SUBCOMP-BO-LIST-XG74920B-01) will be communicated by your account manager within 7 business days."
],
"attachments": [
{"documentName": "QSANS_Product_Specification_Sheets_Q2_2025.pdf", "fileId": "DOC-SPECS-QSANS-Q22025-V1.3"},
{"documentName": "EULA_QSANS_Software_V4.2.pdf", "fileId": "DOC-EULA-QSANS-V4.2"},
{"documentName": "Warranty_Terms_and_Conditions_Premium_Standard.pdf", "fileId": "DOC-WARR-QSANS-PREMSTD-V3.1"},
{"documentName": "Export_Compliance_Declaration_XG74920B.pdf", "fileId": "DOC-EXPCOMPL-XG74920B"},
{"documentName": "Return_Policy_QSANS_2025_V2.pdf", "fileId": "DOC-POL-RET-QSANS-2025-V2"},
{"documentName": "Consultation_Services_SOW_PROJ-EAGLE-QIP-2025.pdf", "fileId": "DOC-SOW-EAGLE-QIP-2025-PH1"}
],
"approvalWorkflow": {
"issuerApproval": {
"approverName": "Mr. Alistair Finch",
"approverTitle": "Head of Commercial Operations",
"approvalDate": "2025-05-23",
"signatureId": "SIG-AFINCH-QSANS-20250523-001A"
},
"clientAcknowledgmentRequired": true,
"clientAcknowledgmentInstructions": "Please sign and return a copy of this invoice or confirm receipt and acceptance via email to billing@quantumsynergistics-ans.co.uk within 5 business days."
},
"versionHistory": [
{"version": 1.0, "date": "2025-05-23", "reason": "Initial Draft", "editorId": "SYS-AUTOINV-GEN"},
{"version": 1.1, "date": "2025-05-23", "reason": "Added shipping details and corrected tax calculation for item QN-CORE-X9000-PRO.", "editorId": "USER-CFO-REVIEW-BOT"}
],
"footerMessage": "Quantum Synergistics & Advanced Nanotech Solutions Ltd. - Pioneering the Future, Today. Thank you for your business. For support, please visit our dedicated portal or contact your account representative. All transactions are governed by the laws of England and Wales. Registered Office: 121B, Innovation Drive, London, EC1Y 8XZ, UK. Company Reg No: REG-LND-09876543X. VAT No: VAT-GB-293847261."
}
'''
With the context above, let's create two examples, one where the answer from the model is the correct ID and the other is missing just one character.
# Long context examples to test reliability check in invoice processing
invoice_examples = [
{
"prompt": "What is the Secure Air Cargo code?",
"output": "The Secure Air Cargo code is SHP-SAC-EXP-CN.",
"context": invoice,
"expected_unreliable": False
},
{
"prompt": "What is the Secure Air Cargo code?",
"output": "The Secure Air Cargo code is HP-SAC-EXP-CN.",
"context": invoice,
"expected_unreliable": True
}
]
Now, by comparing these two answers, we can test the Verify reliability check response:
# Test long context validation with invoice
print("=== Long Context - Invoice Processing Examples ===\n")
print(f"Question: {invoice_examples[0]['prompt']}\n")
# Run the first example
for i, example in enumerate(invoice_examples):
# Print the question and expected answer type
answer_type = "Correct Answer" if not example['expected_unreliable'] else "Wrong Answer"
print(f"#Run {i+1} - {answer_type}")
# Run the reliability check on the invoice example
result = check_reliability_qa(
prompt=example['prompt'],
output=example['output'],
context=example['context'],
return_search_results=False
)
# Print the results
print(f"a- Is Unreliable: {result.get('is_hallucination', 'N/A')}")
print(f"b- Expected value: {example['expected_unreliable']}")
explanation = result.get('explanation', 'N/A')
# Limit explanation to max 400 characters
max_chars = 400
short_explanation = explanation[:max_chars] + "..." if len(explanation) > max_chars else explanation
print(f"c- Short summary of explanation: {short_explanation}")
print("--")
print()
=== Long Context - Invoice Processing Examples === Question: What is the Secure Air Cargo code? #Run 1 - Correct Answer a- Is Unreliable: False b- Expected value: False c- Short summary of explanation: The DOCUMENT contains a JSON object representing an invoice with various details including shipping information. Under the 'shippingAndHandling' section within 'summary', there is a list of charges, one of which is described as 'Secure Air Cargo - Expedited (SHIP-BETA-002)' with the charge code 'SHP-SAC-EXP-CN'. This matches the information given in the ANSWER. -- #Run 2 - Wrong Answer a- Is Unreliable: True b- Expected value: True c- Short summary of explanation: To determine whether the provided answer is faithful to the contents of the DOCUMENT, we need to examine the information given in the DOCUMENT and compare it with the ANSWER. The QUESTION asks for the Secure Air Cargo code. Upon reviewing the DOCUMENT, we find that it contains detailed information about an invoice, including shipping information for various items. Specifically, under "shippingInf... --
OpenAI chat completion endpoint¶
The reliability check conducted via the OpenAI chat completion endpoint method validates multi-turn conversations for reliability issues. This is ideal for conversational AI systems and chatbots.
Prepare the data¶
For this scenario, we need to provide the promp via the user
role, and the answer from the LLM via the assistant
role. We set expected_unreliable
as the ground truth for comparison.
# Chat conversation examples for Chat Completion checks
chat_examples = [
{
"messages": [
{"role": "system", "content": "You are a helpful assistant that provides accurate medical information."},
{"role": "user", "content": "Does vitamin C cure the common cold?"},
{"role": "assistant", "content": "Yes, taking large doses of vitamin C has been scientifically proven to cure the common cold within 24 hours."}
],
"expected_unreliable": True
},
{
"messages": [
{"role": "system", "content": "You are a knowledgeable financial advisor."},
{"role": "user", "content": "What is compound interest?"},
{"role": "assistant", "content": "Compound interest is when you earn interest on both your original investment and the interest that has already been earned. It causes your money to grow exponentially over time."}
],
"expected_unreliable": False
}
]
Reliability check via the OpenAI client¶
# Function for Chat Completion reliability check
def check_reliability_chat(messages):
"""Check reliability in chat conversations using OpenAI library"""
# Create a separate client for chat completions with the correct base URL
client = OpenAI(
base_url=base_url,
api_key=api_key,
)
# Make the request using OpenAI client - pass parameters directly
response = client.chat.completions.create(
model="klusterai/verify-reliability", #Reliability model
messages=messages
)
# Parse the response - kluster.ai returns check results in a specific format
return response.choices[0].message.content
# Test Chat Completion checks with our examples
chat_results = []
for i, example in enumerate(chat_examples):
print(f"=== Chat Example {i+1} ===")
print(f"System: {example['messages'][0]['content']}")
print(f"User: {example['messages'][1]['content']}")
print(f"Assistant: {example['messages'][2]['content']}")
print(f"Expected Unreliable: {example['expected_unreliable']}")
print()
try:
result = check_reliability_chat(
messages=example['messages'],
)
chat_results.append({
'example': example,
'result': result
})
print("Check Result:")
print(f"Result: {result}")
except Exception as e:
print(f"Error processing chat example: {e}")
print("\n" + "="*80 + "\n")
# Summary of chat checks
print("### Chat Check Summary")
print(f"Processed {len(chat_results)} chat conversations")
for i, result in enumerate(chat_results):
expected = result['example']['expected_unreliable']
print(f"Chat {i+1}: Expected unreliable = {expected}")
=== Chat Example 1 === System: You are a helpful assistant that provides accurate medical information. User: Does vitamin C cure the common cold? Assistant: Yes, taking large doses of vitamin C has been scientifically proven to cure the common cold within 24 hours. Expected Unreliable: True Error processing chat example: Error code: 500 - {'error': {'message': 'Unexpected error occurred', 'errorCode': 4000, 'type': 'invalid_request_error'}} ================================================================================ === Chat Example 2 === System: You are a knowledgeable financial advisor. User: What is compound interest? Assistant: Compound interest is when you earn interest on both your original investment and the interest that has already been earned. It causes your money to grow exponentially over time. Expected Unreliable: False Error processing chat example: Error code: 500 - {'error': {'message': 'Unexpected error occurred', 'errorCode': 4000, 'type': 'invalid_request_error'}} ================================================================================ ### Chat Check Summary Processed 0 chat conversations
Summary¶
This tutorial demonstrated how to use the reliability check feature of the Verify service to identify and prevent reliability issues in AI outputs.
In this particular example, we used the dedicated reliability endpoint and the OpenAI-compatible method via the chat completions endpoint.
Some key takeaways:
- Two reliability check methods: A dedicated endpoint for Q/A verifications, and the OpenAI-compatible chat completion endpoint for conversations.
- Two operation modes: General knowledge verification and context-based validation.
- Detailed explanations: The service provides clear reasoning for its determinations.
- Transparent verification: With
return_search_results
enabled, the service provides a list of sources used for verification. This helps users understand the basis for each reliability decision thereby increasing trust in the results.