Built for scale, priced for control
Built for scale, priced for control
Scale from prototype to production with transparent pricing, lightning-fast performance, and no lock-in. Pay only for what you use with no hidden fees.
Scale from prototype to production with transparent pricing, lightning-fast performance, and no lock-in. Pay only for what you use with no hidden fees.
Scale from prototype to production with transparent pricing, lightning-fast performance, and no lock-in. Pay only for what you use with no hidden fees.
Inference pricing
Inference pricing
Inference pricing
Build with open-weight cutting-edge multimodal models for chat, vision, code, and more.
Build with open-weight cutting-edge multimodal models for chat, vision, code, and more.
Build with open-weight cutting-edge multimodal models for chat, vision, code, and more.
Pricing varies with completion window
MODEL
Real time
24 hours
48 hours
72 hours
Qwen3-235B-A22B
$0.15 input/ $2 output
$0.10 input/ $1.50 output
$0.08 input/ $1.00 output
$0.06 input/ $0.75 output
Qwen2.5-VL-7B-Instruct
$0.30 input/output
$0.15
$0.10
$0.05
Llama 4 Maverick
$0.2 input/ $0.8 output
$0.25
$0.20
$0.15
Llama 4 Scout
$0.8 input/ $0.45 output
$0.15
$0.12
$0.10
DeepSeek-V3-0324
$0.7 input/ $1.4 output
$0.63
$0.50
$0.35
DeepSeek-R1-0528
$3 input/ $5 output
$3.50
$3.00
$2.50
DeepSeek-R1
$3 input/ $5 output
$3.50
$3.00
$2.50
Gemma 3
$0.35 input/output
$0.30
$0.25
$0.20
Llama 8B Instruct Turbo
$0.18 input/output
$0.05
$0.04
$0.03
Llama 70B Instruct Turbo
$0.70 input/output
$0.20
$0.18
$0.15
M3-Embeddings
$0.01 input
$0.005
$0.005
$0.005
Mistral NeMo
$0.025 input/ $0.07 output
$0.02 input/ $0.06 output
$0.018 input/ $0.05 output
$0.017 input/ $0.045 output
Magistral-Small-2506
$0.1 input/ $0.3 output
$0.1 input/ $0.3 output
$0.08 input/ $0.25 output
$0.07 input/ $0.22 output
Pricing varies with completion window
MODEL
Real
time
24
hour
48
hours
72
hours
Qwen2.5-VL-7B-Instruct
$0.30 input/output
$0.15
$0.10
$0.05
Qwen3-235B-A22B
$0.15 input
$2 output
$0.10 input
$1.50 output
$0.08 input
$1.00 output
$0.06 input
$0.75 output
Llama 4 Maverick
$0.2 input
$0.8 output
$0.25
$0.20
$0.15
Llama 4 Scout
$0.8 input
$0.45 output
$0.15
$0.12
$0.10
Deep
Seek-V3-0324
$0.7 input
$1.4 output
$0.63
$0.50
$0.35
Deep
Seek-R1-0528
$3 input
$5 output
$3.50
$3.00
$2.50
Deep
Seek-R1
$3 input
$5 output
$3.50
$3.00
$2.50
Gemma 3
$0.35
input/output
$0.30
$0.25
$0.20
Llama
8B
$0.18
input/output
$0.05
$0.04
$0.03
Llama
70B
$0.70 input/output
$0.20
$0.18
$0.15
M3-Embeddings
$0.01 input
$0.005
$0.005
$0.005
Mistral NeMo
$0.025 input/ $0.07 output
$0.02 input/ $0.06 output
$0.018 input/ $0.05 output
$0.017 input/ $0.045 output
Magistral-Small-2506
$0.1
input/
$0.3
output
$0.1
input/
$0.3
output
$0.08 input/ $0.25 output
$0.07 input/ $0.22 output
Pricing varies with completion window
MODEL
Real
time
24
hour
48
hours
72
hours
Qwen2.5-VL-7B-Instruct
$0.30 input/output
$0.15
$0.10
$0.05
Qwen3-235B-A22B
$0.15 input
$2 output
$0.10 input
$1.50 output
$0.08 input
$1.00 output
$0.06 input
$0.75 output
Llama 4 Maverick
$0.2 input
$0.8 output
$0.25
$0.20
$0.15
Llama 4 Scout
$0.8 input
$0.45 output
$0.15
$0.12
$0.10
DeepSeek-V3-0324
$0.7 input
$1.4 output
$0.63
$0.50
$0.35
DeepSeek-R1-0528
$3 input
$5 output
$3.50
$3.00
$2.50
DeepSeek-R1
$3 input
$5 output
$3.50
$3.00
$2.50
Gemma 3
$0.35
input/output
$0.30
$0.25
$0.20
Llama 8B
$0.18
input/output
$0.05
$0.04
$0.03
Llama 70B
$0.70 input/output
$0.20
$0.18
$0.15
M3-Embeddings
$0.01 input
$0.005
$0.005
$0.005
Mistral NeMo
$0.025 input/ $0.07 output
$0.02 input/ $0.06 output
$0.018 input/ $0.05 output
$0.017 input/ $0.045 output
Magistral-Small-2506
$0.1 input/ $0.3 output
$0.1 input/ $0.3 output
$0.08 input/ $0.25 output
$0.07 input/ $0.22 output
Fine-tuning pricing
Fine-tune leading open-weight models on your own datasets for domain-specific accuracy, more reliable behavior, and cost-effective deployment.
Fine-tune leading open-weight models on your own datasets for domain-specific accuracy, more reliable behavior, and cost-effective deployment.
Fine-tune leading open-weight models on your own datasets for domain-specific accuracy, more reliable behavior, and cost-effective deployment.
MODEL
Price 1M tokens
Llama 8b
$0.48
Llama 70B
$2.90
MODEL
Price 1M tokens
Llama 8b
$0.48
Llama 70B
$2.90
MODEL
Price 1M tokens
Llama 8b
$0.48
Llama 70B
$2.90
Guardrails pricing
Use Verify to run real-time checks on LLM outputs to flag unreliable or low-confidence content.
Use Verify to run real-time checks on LLM outputs to flag unreliable or low-confidence content.
Use Verify to run real-time checks on LLM outputs to flag unreliable or low-confidence content.
INPUT/OUTPUT TOKENS
Price 1M tokens
Input tokens
$4.00
Output tokens
$7.00
MODEL
Price 1M tokens
Llama 8b
$0.48
Llama 70B
$2.90
INPUT/OUTPUT TOKENS
Price 1M tokens
Input tokens
$4.00
Output tokens
$7.00
Trial
Ideal for experimentation and basic usage.
Free
Requests per minute:
30
Context window:
Up to 32K tokens
Max output:
Up to 4K tokens
Batch queue:
1K
Hosted fine-tuned models:
1 model
Support: Community
30+ Features
Core
Flexible for occasional usage or smaller projects.
Pay as you go
Minimum $10
Requests per minute:
600
Price:
Pay as you go (minimum $10)
Context window:
Max
Max output:
Max
Batch queue:
100K
Hosted fine-tuned models:
10 models
Team account:
Yes
Support:
Community
Trial
Ideal for experimentation and basic usage.
Free
Requests per minute:
30
Context window:
Up to 32K tokens
Max output:
Up to 4K tokens
Batch queue:
1K
Hosted fine-tuned models:
1 model
Support: Community
30+ Features
Core
Flexible for occasional usage or smaller projects.
Pay as you go
Minimum $10
Requests per minute:
600
Price:
Pay as you go (minimum $10)
Context window:
Max
Max output:
Max
Batch queue:
100K
Hosted fine-tuned models:
10 models
Team account:
Yes
Support:
Community
Trial
Ideal for experimentation and basic usage.
Free
Requests per minute:
30
Context window:
Up to 32K tokens
Max output:
Up to 4K tokens
Batch queue:
1K
Hosted fine-tuned models:
1 model
Support: Community
30+ Features
Core
Flexible for occasional usage or smaller projects.
Pay as you go
Minimum $10
Requests per minute:
600
Price:
Pay as you go (minimum $10)
Context window:
Max
Max output:
Max
Batch queue:
100K
Hosted fine-tuned models:
10 models
Team account:
Yes
Support:
Community