Built for scale, priced for control

Built for scale, priced for control

 Scale from prototype to production with transparent pricing, lightning-fast performance, and no lock-in. Pay only for what you use with no hidden fees.

 Scale from prototype to production with transparent pricing, lightning-fast performance, and no lock-in. Pay only for what you use with no hidden fees.

 Scale from prototype to production with transparent pricing, lightning-fast performance, and no lock-in. Pay only for what you use with no hidden fees.

Inference pricing

Inference pricing

Inference pricing

Build with open-weight cutting-edge multimodal models for chat, vision, code, and more.

Build with open-weight cutting-edge multimodal models for chat, vision, code, and more.

Build with open-weight cutting-edge multimodal models for chat, vision, code, and more.

Pricing varies with completion window

MODEL

Real time

24 hours

48 hours

72 hours

Qwen3-235B-A22B

$0.15 input/ $2 output

$0.10 input/ $1.50 output

$0.08 input/ $1.00 output

$0.06 input/ $0.75 output

Qwen2.5-VL-7B-Instruct

$0.30 input/output

$0.15

$0.10

$0.05

Llama 4 Maverick

$0.2 input/ $0.8 output

$0.25

$0.20

$0.15

Llama 4 Scout

$0.8 input/ $0.45 output

$0.15

$0.12

$0.10

DeepSeek-V3-0324

$0.7 input/ $1.4 output

$0.63

$0.50

$0.35

DeepSeek-R1-0528

$3 input/ $5 output

$3.50

$3.00

$2.50

DeepSeek-R1

$3 input/ $5 output

$3.50

$3.00

$2.50

Gemma 3

$0.35 input/output

$0.30

$0.25

$0.20

Llama 8B Instruct Turbo

$0.18 input/output

$0.05

$0.04

$0.03

Llama 70B Instruct Turbo

$0.70 input/output

$0.20

$0.18

$0.15

M3-Embeddings

$0.01 input

$0.005

$0.005

$0.005

Mistral NeMo

$0.025 input/ $0.07 output

$0.02 input/ $0.06 output

$0.018 input/ $0.05 output

$0.017 input/ $0.045 output

Magistral-Small-2506

$0.1 input/ $0.3 output

$0.1 input/ $0.3 output

$0.08 input/ $0.25 output

$0.07 input/ $0.22 output

Pricing varies with completion window

MODEL

Real

time

24

hour

48

hours

72

hours

Qwen2.5-VL-7B-Instruct

$0.30 input/output

$0.15

$0.10

$0.05

Qwen3-235B-A22B

$0.15 input

$2 output

$0.10 input

$1.50 output

$0.08 input

$1.00 output

$0.06 input

$0.75 output

Llama 4 Maverick

$0.2 input

$0.8 output

$0.25

$0.20

$0.15

Llama 4 Scout

$0.8 input

$0.45 output

$0.15

$0.12

$0.10

Deep

Seek-V3-0324

$0.7 input

$1.4 output

$0.63

$0.50

$0.35

Deep

Seek-R1-0528

$3 input

$5 output

$3.50

$3.00

$2.50

Deep

Seek-R1

$3 input

$5 output

$3.50

$3.00

$2.50

Gemma 3

$0.35

input/output

$0.30

$0.25

$0.20

Llama

8B

$0.18

input/output

$0.05

$0.04

$0.03

Llama

70B

$0.70 input/output

$0.20

$0.18

$0.15

M3-Embeddings

$0.01 input

$0.005

$0.005

$0.005

Mistral NeMo

$0.025 input/ $0.07 output

$0.02 input/ $0.06 output

$0.018 input/ $0.05 output

$0.017 input/ $0.045 output

Magistral-Small-2506

$0.1

input/

$0.3

output

$0.1

input/

$0.3

output

$0.08 input/ $0.25 output

$0.07 input/ $0.22 output

Pricing varies with completion window

MODEL

Real

time

24

hour

48

hours

72

hours

Qwen2.5-VL-7B-Instruct

$0.30 input/output

$0.15

$0.10

$0.05

Qwen3-235B-A22B

$0.15 input

$2 output

$0.10 input

$1.50 output

$0.08 input

$1.00 output

$0.06 input

$0.75 output

Llama 4 Maverick

$0.2 input

$0.8 output

$0.25

$0.20

$0.15

Llama 4 Scout

$0.8 input

$0.45 output

$0.15

$0.12

$0.10

DeepSeek-V3-0324

$0.7 input

$1.4 output

$0.63

$0.50

$0.35

DeepSeek-R1-0528

$3 input

$5 output

$3.50

$3.00

$2.50

DeepSeek-R1

$3 input

$5 output

$3.50

$3.00

$2.50

Gemma 3

$0.35

input/output

$0.30

$0.25

$0.20

Llama 8B

$0.18

input/output

$0.05

$0.04

$0.03

Llama 70B

$0.70 input/output

$0.20

$0.18

$0.15

M3-Embeddings

$0.01 input

$0.005

$0.005

$0.005

Mistral NeMo

$0.025 input/ $0.07 output

$0.02 input/ $0.06 output

$0.018 input/ $0.05 output

$0.017 input/ $0.045 output

Magistral-Small-2506

$0.1 input/ $0.3 output

$0.1 input/ $0.3 output

$0.08 input/ $0.25 output

$0.07 input/ $0.22 output

Interested in a custom large-scale deployment?

Fine-tuning pricing

Fine-tune leading open-weight models on your own datasets for domain-specific accuracy, more reliable behavior, and cost-effective deployment.

Fine-tune leading open-weight models on your own datasets for domain-specific accuracy, more reliable behavior, and cost-effective deployment.

Fine-tune leading open-weight models on your own datasets for domain-specific accuracy, more reliable behavior, and cost-effective deployment.

MODEL

Price 1M tokens

Llama 8b

$0.48

Llama 70B

$2.90

MODEL

Price 1M tokens

Llama 8b

$0.48

Llama 70B

$2.90

MODEL

Price 1M tokens

Llama 8b

$0.48

Llama 70B

$2.90

Guardrails pricing

Use Verify to run real-time checks on LLM outputs to flag unreliable or low-confidence content.

Use Verify to run real-time checks on LLM outputs to flag unreliable or low-confidence content.

Use Verify to run real-time checks on LLM outputs to flag unreliable or low-confidence content.

INPUT/OUTPUT TOKENS

Price 1M tokens

Input tokens

$4.00

Output tokens

$7.00

MODEL

Price 1M tokens

Llama 8b

$0.48

Llama 70B

$2.90

INPUT/OUTPUT TOKENS

Price 1M tokens

Input tokens

$4.00

Output tokens

$7.00

Trial

Ideal for experimentation and basic usage.

Free

Requests per minute:

30

Context window:

Up to 32K tokens

Max output:

Up to 4K tokens

Batch queue:

1K

Hosted fine-tuned models:

1 model

Support: Community

30+ Features

Core

Flexible for occasional usage or smaller projects.

Pay as you go

Minimum $10

Requests per minute:

600

Price:

Pay as you go (minimum $10)

Context window:

Max

Max output:

Max

Batch queue:

100K

Hosted fine-tuned models:

10 models

Team account:

Yes

Support:

Community

Scale

Perfect for growing projects.

Custom price

Requests per minute:

1,200

Context window:

Max

Max output:

Max

Batch queue:

500K

Hosted fine-tuned models:

25 models

Team account:

Yes

Support:

Email

Enterprise

Customized solutions for enterprise and high-scale demands.

Price

Requests per minute:

Unlimited

Price:

Custom

Context window:

Max

Max output:

Max

Batch queue:

Unlimited

Hosted fine-tuned models:

Unlimited

Team account:

Yes

Support:

Dedicated account manager

Trial

Ideal for experimentation and basic usage.

Free

Requests per minute:

30

Context window:

Up to 32K tokens

Max output:

Up to 4K tokens

Batch queue:

1K

Hosted fine-tuned models:

1 model

Support: Community

30+ Features

Core

Flexible for occasional usage or smaller projects.

Pay as you go

Minimum $10

Requests per minute:

600

Price:

Pay as you go (minimum $10)

Context window:

Max

Max output:

Max

Batch queue:

100K

Hosted fine-tuned models:

10 models

Team account:

Yes

Support:

Community

Scale

Perfect for growing projects.

Custom price

Requests per minute:

1,200

Context window:

Max

Max output:

Max

Batch queue:

500K

Hosted fine-tuned models:

25 models

Team account:

Yes

Support:

Email

Enterprise

Customized solutions for enterprise and high-scale demands.

Price

Requests per minute:

Unlimited

Price:

Custom

Context window:

Max

Max output:

Max

Batch queue:

Unlimited

Hosted fine-tuned models:

Unlimited

Team account:

Yes

Support:

Dedicated account manager

Trial

Ideal for experimentation and basic usage.

Free

Requests per minute:

30

Context window:

Up to 32K tokens

Max output:

Up to 4K tokens

Batch queue:

1K

Hosted fine-tuned models:

1 model

Support: Community

30+ Features

Core

Flexible for occasional usage or smaller projects.

Pay as you go

Minimum $10

Requests per minute:

600

Price:

Pay as you go (minimum $10)

Context window:

Max

Max output:

Max

Batch queue:

100K

Hosted fine-tuned models:

10 models

Team account:

Yes

Support:

Community

Scale

Perfect for growing projects.

Custom price

Requests per minute:

1,200

Context window:

Max

Max output:

Max

Batch queue:

500K

Hosted fine-tuned models:

25 models

Team account:

Yes

Support:

Email

Enterprise

Customized solutions for enterprise and high-scale demands.

Price

Requests per minute:

Unlimited

Price:

Custom

Context window:

Max

Max output:

Max

Batch queue:

Unlimited

Hosted fine-tuned models:

Unlimited

Team account:

Yes

Support:

Dedicated account manager

Looking for something else?