Models on kluster.ai#
kluster.ai offers a wide variety of open-source models for both real-time and batch inferences, with more being constantly added.
This page covers all the models the API supports, with the API request limits for each.
Model names#
Each model supported by kluster.ai has a unique name that must be used when defining the model
in the request.
Model | Model API name |
---|---|
DeepSeek R1 | deepseek-ai/DeepSeek-R1 |
DeepSeek V3 | deepseek-ai/DeepSeek-V3 |
DeepSeek V3 0324 | deepseek-ai/DeepSeek-V3-0324 |
Gemma 3 27B | google/gemma-3-27b-it |
Llama 3.1 8B | klusterai/Meta-Llama-3.1-8B-Instruct-Turbo |
Llama 3.1 405B | klusterai/Meta-Llama-3.1-405B-Instruct-Turbo |
Llama 3.3 70B | klusterai/Meta-Llama-3.3-70B-Instruct-Turbo |
Qwen 2.5 7B | Qwen/Qwen2.5-VL-7B-Instruct |
Model comparison table#
Model | Main use case |
Real-time inference support |
Batch inference support |
Fine-tuning support |
Image analysis |
Function calling |
---|---|---|---|---|---|---|
DeepSeek R1 | Code generation Complex data analysis |
|||||
DeepSeek V3 | Natural language generation Contextually rich writing |
|||||
DeepSeek V3 0324 | Natural language generation Contextually rich writing |
|||||
Gemma 3 27B | Multilingual applications Image analysis Complex reasoning |
|||||
Llama 3.1 8B | Low-latency or simple tasks Cost-efficient inference |
|||||
Llama 3.1 405B | Detailed analysis Maximum accuracy |
|||||
Llama 3.3 70B | General-purpose AI Balanced cost-performance |
|||||
Qwen 2.5 7B | Document analysis Image-based reasoning Multimodal chat |
API request limits#
The following limits apply to API requests based on your plan tier (notation is free tier | standard tier
):
Model | Context size |
Max output |
Max batch requests |
Concurrent requests |
Requests per minute |
---|---|---|---|---|---|
DeepSeek R1 | 32k | 162k | 4k | 162k | <1000 | No limit | 2 | 10 | 1 | 60 |
DeepSeek V3 | 32k | 131k | 4k | 131k | <1000 | No limit | 2 | 10 | 1 | 60 |
DeepSeek V3 0324 | 32k | 131k | 4k | 131k | <1000 | No limit | 2 | 10 | 1 | 60 |
Gemma 3 27B | 32k | 32k | 4k | 8k | <1000 | No limit | 2 | 10 | 1 | 60 |
Llama 3.1 8B | 32k | 131k | 4k | 131k | <1000 | No limit | 2 | 10 | 1 | 60 |
Llama 3.1 405B | 32k | 131k | 4k | 131k | <1000 | No limit | 2 | 10 | 1 | 60 |
Llama 3.3 70B | 32k | 131k | 4k | 131k | <1000 | No limit | 2 | 10 | 1 | 60 |
Qwen 2.5 7B | 32k | 32k | 4k | 8k | <1000 | No limit | 2 | 10 | 1 | 60 |