Custom Models
Register and manage your own model endpoints for use in consensus cases.
Overview
Custom models (BYOM — Bring Your Own Model) let you connect external model endpoints to Aira. Once registered, custom models can be used in cases alongside built-in models.
Custom model endpoints must be OpenAI-compatible by default — accepting chat completion requests and returning structured decisions.
Self-hosted open-weight models now have dedicated support with per-model system prompts and constrained decoding (vLLM guided_json). See Supported Models for the full list and setup instructions.
Register Model
POST /api/v1/models/custom
Authorization: Bearer aira_live_xxxxxRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name (1-200 chars) |
model_id | string | Yes | Unique model identifier (1-100 chars) |
endpoint_url | string | Yes | HTTPS endpoint URL |
auth_header | string | No | Authorization header value (e.g. Bearer sk-...) |
request_schema | object | No | Request format (default: {"type": "openai_compatible"}) |
response_schema | object | No | Response parsing (default: {"type": "openai_compatible", "content_path": "choices[0].message.content"}) |
timeout_ms | integer | No | Request timeout in ms (1,000-120,000, default: 30,000) |
Example Request
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Our Fine-tuned GPT",
"model_id": "acme/compliance-v2",
"endpoint_url": "https://api.acme.com/v1/chat/completions",
"auth_header": "Bearer sk-acme-abc123...",
"timeout_ms": 45000
}'Response (201 Created)
{
"id": "cm_01J8X...",
"name": "Our Fine-tuned GPT",
"model_id": "acme/compliance-v2",
"endpoint_url": "https://api.acme.com/v1/chat/completions",
"has_auth": true,
"request_schema": { "type": "openai_compatible" },
"response_schema": { "type": "openai_compatible", "content_path": "choices[0].message.content" },
"timeout_ms": 45000,
"status": "untested",
"last_tested": null,
"created_at": "2026-03-14T10:30:00Z",
"request_id": "req_01J8X..."
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique model registration ID |
has_auth | boolean | Whether an auth header is configured (never exposes the value) |
status | string | untested, ok, or error |
last_tested | string|null | ISO 8601 timestamp of last test |
List Models
GET /api/v1/models/custom
Authorization: Bearer aira_live_xxxxxReturns all custom models registered by your organization.
Get Model
GET /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxxReturns a single custom model by ID.
Update Model
PUT /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxxRequest Body
All fields are optional — only include fields you want to change.
| Field | Type | Description |
|---|---|---|
name | string | Updated display name |
endpoint_url | string | Updated HTTPS endpoint |
auth_header | string | Updated auth header |
request_schema | object | Updated request format |
response_schema | object | Updated response parsing |
timeout_ms | integer | Updated timeout (1,000-120,000) |
Example Request
curl -X PUT https://api.airaproof.com/api/v1/models/custom/cm_01J8X... \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"endpoint_url": "https://api.acme.com/v2/chat/completions",
"timeout_ms": 60000
}'Delete Model
DELETE /api/v1/models/custom/{id}
Authorization: Bearer aira_live_xxxxxRemoves the custom model. Returns 204 No Content.
Test Model
POST /api/v1/models/custom/{id}/test
Authorization: Bearer aira_live_xxxxxSends a test request to the model endpoint and validates the response format.
Response (200 OK)
{
"status": "ok",
"latency_ms": 1250,
"response_valid": true,
"structured_output": {
"decision": "APPROVE",
"confidence": 0.85,
"key_factors": ["test factor"],
"reasoning": "Test reasoning response"
},
"error": null,
"request_id": "req_01J8X..."
}If the endpoint is unreachable or returns invalid output:
{
"status": "error",
"latency_ms": 30000,
"response_valid": false,
"structured_output": null,
"error": "Request timed out after 30 seconds",
"request_id": "req_..."
}Test Response Fields
| Field | Type | Description |
|---|---|---|
status | string | ok if endpoint responded correctly, error otherwise |
latency_ms | integer | Round-trip time in milliseconds |
response_valid | boolean | Whether the response matches the expected schema |
structured_output | object|null | Parsed decision output (if valid) |
error | string|null | Error message (if failed) |
Always test a model after registering or updating it. Models with error status can still be used in cases, but may cause individual model failures.
Supported Self-Hosted Models
The following open-weight models have been tested and have dedicated per-model system prompts optimized for Aira's structured output format:
| Model | Parameters | License |
|---|---|---|
| Llama 3.3 70B | 70B | Llama Community License |
| Llama 4 Scout | 17B active (MoE) | Llama Community License |
| Llama 4 Maverick | 17B active (MoE) | Llama Community License |
| Mistral Large 3 | 41B active (MoE) | Apache 2.0 |
| Qwen 3.5 | 397B (MoE) | Apache 2.0 |
| DeepSeek V3.2 | 37B active (MoE) | MIT |
Other OpenAI-compatible models can still be registered as custom models but will use generic prompting.
vLLM Setup Example
vLLM is the recommended serving engine. It supports guided_json constrained decoding, which guarantees valid structured output from any model.
# Serve a model with vLLM
vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 4 \
--enable-auto-tool-choiceThen register it:
curl -X POST https://api.airaproof.com/api/v1/models/custom \
-H "Authorization: Bearer aira_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Llama 4 Scout (vLLM)",
"model_id": "llama-4-scout",
"endpoint_url": "https://your-gpu-server:8000/v1/chat/completions",
"timeout_ms": 60000
}'When using constrained decoding, ensure your vLLM version supports guided_json (v0.6.0+). Older versions may fall back to unconstrained generation.