Skip to content

Dedicated deployments

Dedicated deployments let you run a private instance of any Hugging Face text model on hardware reserved just for you. Enjoy full control, predictable per‑minute billing, and zero per‑token costs.

This page covers how to create, use, and stop your dedicated deployments.

Create a deployment

Ensure you're logged in to the kluster.ai platform, then navigate to the Dedicated deployments page, then press Launch deployment.

Launch deployment

Then, complete the following fields to configure your deployment:

  1. Deployment name: Enter a clear deployment name (e.g., mydedicated) so you can spot it later in the console.
  2. Model selection: Paste the Hugging Face model ID or URL (e.g., deepseek-ai/DeepSeek-R1). If the model is private, provide a Hugging Face access token.
  3. Select hardware: Confirm a GPU configuration.
  4. Specify auto-shutdown: Set an auto‑shutdown window for your instance to power down after a specified period of inactivity, between 15 minutes to 12 hours.
  5. Launch: Review the estimated price and then Click Launch deployment. Spin‑up takes ≈20–30 min; once the status shows Running, copy the endpoint ID, as you'll use that to submit requests.

Configure deployment

Use your dedicated deployment

After waiting 20-30 minutes for your instance to spin up, you can call it by using the endpoint ID as the model name when making a request. If you're unsure of your endpoint ID, look for it in the Dedicated deployments page.

Copy endpoint ID

To call your dedicated deployment, you'll need to provide the endpoint ID as the model name when making a request (INSERT_ENDPOINT_ID in the following example):

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.kluster.ai/v1"
)

response = client.chat.completions.create(
    model="INSERT_ENDPOINT_ID",   # Your endpoint ID
    messages=[{"role": "user", "content": "What is the best taco place in SF?"}],
)

print(response.choices[0].message.content)
curl https://api.kluster.ai/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "INSERT_ENDPOINT_ID",
    "messages": [{"role": "user", "content": "What is the best taco place in SF?"}]
  }'

Stop your deployment

Click Stop next to your deployment on the Dedicated deployments page to shut your VM down immediately. Billing ends the moment it powers off.

Otherwise, an auto‑shutdown timer kicks in after your specified auto-shutdown period (between 15 minutes and 12 hours of inactivity), depending on the period you chose when spinning up the instance.

Stop deployment

Questions? Email support@kluster.ai, and we’ll be happy to help!