Models Pricing Join the Waitlist

Pricing

Transparent pricing.
No hidden fees.

Pick a model, deploy in minutes, pay by the hour. 1-hour minimum per deployment. Unused whole hours are refunded to your account balance.

Small

Up to 13B params

~$1.50/hr

Llama 3.1 8B, Qwen 3.5 9B

Medium

20B – 40B params

~$3.50/hr

Qwen 3.5 27B, Qwen 3.5 35B, GPT OSS 20B

Large

70B – 130B params

~$6.50/hr

Llama 3.3 70B, Qwen 3.5 122B, Qwen3 Coder Next

XL / MoE

200B+ params

~$17.00/hr

Llama 4 Scout, MiniMax M2.5

Exact pricing depends on model. See the full catalog below.

Full model catalog

Every row is a deployable configuration.

Llama
Model name Parameters Max context length Price / hr
Llama 3.1 8B Instruct 8B 128k $1.50
Llama 3.3 70B Instruct 70B 128k $12.00
Llama 4 Scout 17B-16E Instruct 109B (17 active) 10M $17.00
Qwen
Model name Parameters Max context length Price / hr
Qwen 3.5 9B 9B 256k $1.50
Qwen 3.5 27B 27B 256k $6.00
Qwen 3.5 35B 35B 256k $6.00
Qwen 3.5 122B (FP8) 125B 256k $13.00
Qwen3 Coder Next 80B 256k $12.00
Qwen3 Coder Next (FP8) 80B 256k $6.50
GPT OSS
Model name Parameters Max context length Price / hr
GPT OSS 20B 22B 128k $3.50
GPT OSS 120B 120B 128k $6.50
MiniMax
Model name Parameters Max context length Price / hr
MiniMax M2.5 229B 200k $17.00

Disclaimer: The above prices are for reference only. Exact pricing depends on model, replica sets, number of concurrent users, and GPU type and no. of GPUs.

Bring your own model

Have a fine-tuned or custom model?
Contact us and we'll help you deploy it on Inferfly.

Raw GPU hourly rates (Discounts Available)

GPU VRAM Price / hr
L40S 48 GB $1.50
A100 SXM 80 GB $3.00
H100 SXM 80 GB $5.00
H200 SXM 141 GB $6.50
B200 180 GB $8.50

Not sure which GPU fits your model? Get in touch.

Enterprise

Need something bigger?

Looking to run Llama 4 Maverick, Qwen 3.5 397B, Kimi K2.5, or other large-scale models? We provision these on demand. Tell us what you need and we'll get you a dedicated deployment with custom pricing.

Talk to us