GPU inference marketplace

Any model. Any SDK.
Distributed inference.

Access every open-weight LLM through the SDK you already use — OpenAI, Anthropic, or Gemini. Capacity sourced from verified data center fleets with cryptographic attestation.

Start for free Browse models

OpenAI SDK/Anthropic SDK/Gemini SDK

For developers

Use the SDK you already know

Inferegator translates protocols transparently. Point your existing OpenAI, Anthropic, or Gemini client at our endpoint — streaming, function calling, and vision work without code changes.

OpenAIAnthropicGemini

import openai

# Change one line — everything else works
client = openai.OpenAI(
    base_url="https://raw-router.controlrod.ai/v1",
    api_key="sk-inf-YOUR_KEY",
)

# Same API, same streaming, same tools
stream = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

How it works

From request to response in milliseconds

Your API call is routed to the best-performing GPU on the network. Every response is cryptographically signed. Every provider is continuously verified.

Your SDK call

OpenAI, Anthropic, or Gemini — any client library

→

Smart routing

Weighted by health, latency, capacity, and attestation score

→

Verified GPU

Cryptographically signed response from benchmarked hardware

→

Streamed to you

Per-token billing, full Langfuse tracing if enabled

Model catalog

Run what you want

View all models

Open-weight models from Meta, Mistral, Qwen, DeepSeek, and more — running on verified GPU hardware. Transparent per-token pricing.

Trust layer

Cryptographic verification at every layer

Inferegator verifies every node in the network through adaptive cryptographic probes, hardware fingerprinting, and signed responses — no TEE dependency, no trust assumptions.

Canary probes

Encrypted test prompts injected at adaptive rates. Nodes cannot distinguish canaries from real traffic.

Hardware fingerprint

GPU compute timing signatures verified against a reference database. Hardware spoofing fails the benchmark.

Signed responses

Every inference response is signed with ECDSA secp256k1. Tampering or substitution is cryptographically detectable.

Fleet trust tiers

Verified fleet operators get reduced canary rates, direct HTTP connectivity, and configurable verification policies.

For data center operators

Monetize idle GPU capacity

Deploy a single lightweight binary across your fleet. Inferegator handles demand routing, billing, and settlement — you control model assignment, rollout cadence, and node operations through the fleet dashboard or API.

Flexible

Revenue share

1 binary

Single universal appliance

Per contract

Digital contract payouts

Bulk operations

Suspend, drain, resume, or assign models to thousands of nodes in a single API call. Filter by GPU model, status, or attestation score.

Staged rollouts

Deploy appliance updates in waves — 5%, 25%, 100% — with automatic rollback if failure rate exceeds 5%.

Direct connectivity

Fleet nodes connect via direct HTTP, skipping tunnel overhead. Configurable canary rates, including full bypass for trusted hardware.

Webhook alerts

Real-time notifications for node down, rollout paused, canary failures, and attestation score changes.

Why Inferegator

Transparent pricing

Per-token pricing on every model. No seat fees, no compute minimums, no hidden charges. Prepaid or postpaid billing.

</>

Universal SDK

OpenAI, Anthropic, Gemini — use whichever client you prefer. Protocol translation is transparent and lossless.

Multi-provider resilience

Requests routed across verified data center fleets. No single point of failure. Capacity backed by enterprise-grade hardware.

Zero lock-in

Standard APIs, standard models. Move workloads on or off with zero code changes. Inference runs on provider hardware, not ours.

Ready to get started?

API consumers — create a free account and get your key in minutes. Data center operators — contact us to onboard your fleet.

Create API account Read the docs

Any model. Any SDK.Distributed inference.