Inferegator Portal

Getting started

Inferegator provides an OpenAI-compatible API for GPU inference. Use your existing OpenAI client libraries by changing the base URL and providing your Inferegator API key.

Base URL

https://api.inferegator.com/v1

Authentication

Include your API key in the Authorization header as a Bearer token. Create keys from the API Keys page after signing in.

Authorization: Bearer sk-inf-...

Chat completions

Send a chat completion request with the model ID and messages array. Streaming is supported via SSE.

Python

import openai

client = openai.OpenAI(
    base_url="https://api.inferegator.com/v1",
    api_key="sk-inf-YOUR_KEY",
)

response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain GPU inference in one sentence."},
    ],
    temperature=0.7,
    max_tokens=256,
)

print(response.choices[0].message.content)

cURL

curl https://api.inferegator.com/v1/chat/completions \
  -H "Authorization: Bearer sk-inf-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-70b-instruct",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Streaming

stream = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Models

List available models with the models endpoint. Each model includes pricing, context length, and capability information. See the full catalog on the Models page.

GET /v1/models

{
  "data": [
    {
      "id": "meta-llama/llama-3.1-70b-instruct",
      "object": "model",
      "owned_by": "meta-llama"
    }
  ]
}

Rate limits

Rate limits are applied per API key. Default limits are 60 requests per minute and 100,000 tokens per minute. Contact support for higher limits.

Rate limit headers are included in every response:

X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 59
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Tokens: 99744

Errors

Errors follow the OpenAI error format. Common status codes:

— Bad request (invalid parameters)
— Unauthorized (invalid or missing API key)
— Rate limit exceeded
— Internal server error
— Model temporarily unavailable

Ready to start? Create an account and get your API key.

Get started

API Documentation