Models Pricing Join the Waitlist

Run Models.
Skip the Ops.

OpenAI-compatible inference endpoints on production-ready GPUs. No cluster management. Just your model, live in minutes.

inferfly — bash
curl \
  https://{id}.api.inferfly.ai/v1/chat/completions \
  -H "Authorization: Bearer ifk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "{model}",
    "messages": [{"role": "user", "content": "{message}"}]
  }'
OpenAI-compatible vLLM-powered GPU-backed Bring your own model

From zero to inference in three steps

Choose a model

Browse the catalog of pre-validated open-source models. Each comes with a tested GPU configuration.

Configure your deployment

Select your GPUs and deployment hours. Transparent and prepaid pricing with a 1-hour minimum. Unused whole hours are refunded.

Get your endpoint

Your OpenAI-compatible API endpoint goes live in under 5 minutes. Drop it into your existing code — LangChain, LlamaIndex, or raw HTTP.

Run the models that matter

Every model ships with a pre-validated GPU configuration. No trial-and-error, no OOM surprises.

GPT-OSS 20B

Microsoft

RTX Pro 6000 / H200

Managed inference. Not managed complexity.

Everything you need to run models in production.

API surface

OpenAI-compatible by default

Your existing LangChain, LlamaIndex, or OpenAI SDK code works unchanged. Swap the base URL. That is it.

Hardware

Pre-validated GPU configs

Every model ships with a tested GPU pairing. No quantization guesswork, no OOM surprises at 3 AM.

Observability

Built-in metrics dashboard

Latency, throughput, and token usage at a glance. All in one place.

Governance

Per-key rate limiting

Usage tracking and rate limits per API key. Built for teams and agencies serving multiple clients.

Realtime

Streaming SSE out of the box

Server-sent events for real-time token streaming. Same interface as the OpenAI streaming API.

Billing

Unused hours refunded

Pay by the hour, prepaid. Unused whole hours are returned as account balance.

Ready to ship

Your model.
Your endpoint.
Live in minutes.