Small
Up to 13B params
~$1.50/hr
Llama 3.1 8B, Qwen 3.5 9B
Pricing
Pick a model, deploy in minutes, pay by the hour. 1-hour minimum per deployment. Unused whole hours are refunded to your account balance.
Small
Up to 13B params
~$1.50/hr
Llama 3.1 8B, Qwen 3.5 9B
Medium
20B – 40B params
~$3.50/hr
Qwen 3.5 27B, Qwen 3.5 35B, GPT OSS 20B
Large
70B – 130B params
~$6.50/hr
Llama 3.3 70B, Qwen 3.5 122B, Qwen3 Coder Next
XL / MoE
200B+ params
~$17.00/hr
Llama 4 Scout, MiniMax M2.5
Exact pricing depends on model. See the full catalog below.
Every row is a deployable configuration.
| Model name | Parameters | Max context length | Price / hr |
|---|---|---|---|
| Llama 3.1 8B Instruct | 8B | 128k | $1.50 |
| Llama 3.3 70B Instruct | 70B | 128k | $12.00 |
| Llama 4 Scout 17B-16E Instruct | 109B (17 active) | 10M | $17.00 |
| Model name | Parameters | Max context length | Price / hr |
|---|---|---|---|
| Qwen 3.5 9B | 9B | 256k | $1.50 |
| Qwen 3.5 27B | 27B | 256k | $6.00 |
| Qwen 3.5 35B | 35B | 256k | $6.00 |
| Qwen 3.5 122B (FP8) | 125B | 256k | $13.00 |
| Qwen3 Coder Next | 80B | 256k | $12.00 |
| Qwen3 Coder Next (FP8) | 80B | 256k | $6.50 |
| Model name | Parameters | Max context length | Price / hr |
|---|---|---|---|
| GPT OSS 20B | 22B | 128k | $3.50 |
| GPT OSS 120B | 120B | 128k | $6.50 |
| Model name | Parameters | Max context length | Price / hr |
|---|---|---|---|
| MiniMax M2.5 | 229B | 200k | $17.00 |
Disclaimer: The above prices are for reference only. Exact pricing depends on model, replica sets, number of concurrent users, and GPU type and no. of GPUs.
Have a fine-tuned or custom model?
Contact us and we'll help you deploy it on Inferfly.
Raw GPU hourly rates (Discounts Available)
| GPU | VRAM | Price / hr |
|---|---|---|
| L40S | 48 GB | $1.50 |
| A100 SXM | 80 GB | $3.00 |
| H100 SXM | 80 GB | $5.00 |
| H200 SXM | 141 GB | $6.50 |
| B200 | 180 GB | $8.50 |
Not sure which GPU fits your model? Get in touch.
Enterprise
Looking to run Llama 4 Maverick, Qwen 3.5 397B, Kimi K2.5, or other large-scale models? We provision these on demand. Tell us what you need and we'll get you a dedicated deployment with custom pricing.
Talk to us