Pricing

Transparent pricing.
No hidden fees.

Pick a model, deploy in minutes, pay by the hour. 1-hour minimum per deployment. Unused whole hours are refunded to your account balance.

Small

Up to 13B params

~$1.50/hr

Llama 3.1 8B, Qwen 3.5 9B

Medium

20B – 40B params

~$3.50/hr

Qwen 3.5 27B, Qwen 3.5 35B, GPT OSS 20B

Large

70B – 130B params

~$6.50/hr

Llama 3.3 70B, Qwen 3.5 122B, Qwen3 Coder Next

XL / MoE

200B+ params

~$17.00/hr

Llama 4 Scout, MiniMax M2.5

Exact pricing depends on model. See the full catalog below.

Full model catalog

Every row is a deployable configuration.

Llama

Model name	Parameters	Max context length	Price / hr
Llama 3.1 8B Instruct	8B	128k	$1.50
Llama 3.3 70B Instruct	70B	128k	$12.00
Llama 4 Scout 17B-16E Instruct	109B (17 active)	10M	$17.00

Qwen

Model name	Parameters	Max context length	Price / hr
Qwen 3.5 9B	9B	256k	$1.50
Qwen 3.5 27B	27B	256k	$6.00
Qwen 3.5 35B	35B	256k	$6.00
Qwen 3.5 122B (FP8)	125B	256k	$13.00
Qwen3 Coder Next	80B	256k	$12.00
Qwen3 Coder Next (FP8)	80B	256k	$6.50

GPT OSS

Model name	Parameters	Max context length	Price / hr
GPT OSS 20B	22B	128k	$3.50
GPT OSS 120B	120B	128k	$6.50

MiniMax

Model name	Parameters	Max context length	Price / hr
MiniMax M2.5	229B	200k	$17.00

Disclaimer: The above prices are for reference only. Exact pricing depends on model, replica sets, number of concurrent users, and GPU type and no. of GPUs.

Bring your own model

Have a fine-tuned or custom model?
Contact us and we'll help you deploy it on Inferfly.

Raw GPU hourly rates (Discounts Available)

GPU	VRAM	Price / hr
L40S	48 GB	$1.50
A100 SXM	80 GB	$3.00
H100 SXM	80 GB	$5.00
H200 SXM	141 GB	$6.50
B200	180 GB	$8.50

Not sure which GPU fits your model? Get in touch.

Enterprise

Need something bigger?

Looking to run Llama 4 Maverick, Qwen 3.5 397B, Kimi K2.5, or other large-scale models? We provision these on demand. Tell us what you need and we'll get you a dedicated deployment with custom pricing.

Talk to us

Transparent pricing.No hidden fees.

Full model catalog

Bring your own model

Need something bigger?

Transparent pricing.
No hidden fees.