Getting started
Inferegator provides an OpenAI-compatible API for GPU inference. Use your existing OpenAI client libraries by changing the base URL and providing your Inferegator API key.
Base URL
https://api.inferegator.com/v1
Authentication
Include your API key in the Authorization header as a Bearer token. Create keys from the API Keys page after signing in.
Authorization: Bearer sk-inf-...
Chat completions
Send a chat completion request with the model ID and messages array. Streaming is supported via SSE.
Python
import openai
client = openai.OpenAI(
base_url="https://api.inferegator.com/v1",
api_key="sk-inf-YOUR_KEY",
)
response = client.chat.completions.create(
model="meta-llama/llama-3.1-70b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain GPU inference in one sentence."},
],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)
cURL
curl https://api.inferegator.com/v1/chat/completions \
-H "Authorization: Bearer sk-inf-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/llama-3.1-70b-instruct",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Streaming
stream = client.chat.completions.create(
model="meta-llama/llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Models
List available models with the models endpoint. Each model includes pricing, context length, and capability information. See the full catalog on the Models page.
GET /v1/models
{
"data": [
{
"id": "meta-llama/llama-3.1-70b-instruct",
"object": "model",
"owned_by": "meta-llama"
}
]
}
Rate limits
Rate limits are applied per API key. Default limits are 60 requests per minute and 100,000 tokens per minute. Contact support for higher limits.
Rate limit headers are included in every response:
X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 59
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Tokens: 99744
Errors
Errors follow the OpenAI error format. Common status codes:
400 — Bad request (invalid parameters)
401 — Unauthorized (invalid or missing API key)
429 — Rate limit exceeded
500 — Internal server error
503 — Model temporarily unavailable
Ready to start? Create an account and get your API key.
Get started