Provider-Specific Prompting

How Aira optimizes prompts and structured output enforcement for each AI provider.

Why Provider-Specific Prompts?

Each AI provider has a different optimal way to enforce structured output. A generic "respond in JSON" prompt is unreliable. Aira uses each provider's native schema enforcement at the decoding level — not just prompt instructions.

This means the JSON structure is guaranteed by the model's token sampling, not by hoping the model follows instructions.

OpenAI (GPT-5.5, GPT-5.4, GPT-5.2, o3)

Method: Structured Outputs with json_schema + strict: true

OpenAI's Structured Outputs feature uses constrained decoding to guarantee the output matches your JSON Schema exactly. The model literally cannot emit tokens that violate the schema.

What Aira sends to OpenAI

response = await client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": details},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "decision_response",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "decision": {
                        "type": "string",
                        "enum": ["APPROVE", "DENY", "REVIEW"]
                    },
                    "confidence": {"type": "number"},
                    "key_factors": {
                        "type": "array",
                        "items": {"type": "string"}
                    },
                    "reasoning": {"type": "string"}
                },
                "required": ["decision", "confidence", "key_factors", "reasoning"],
                "additionalProperties": False
            }
        }
    },
    temperature=0.1,
)

System prompt style: Markdown headers following OpenAI's recommended structure — Role, Task, Constraints, Output format.

Key settings:

temperature: 0.1 — near-deterministic for decision tasks
max_tokens: 1000 — generous for small decision objects
Refusal handling: checks message.refusal field for safety filter blocks

Anthropic (Claude Sonnet 4.6, Claude Opus 4.6)

Method: Strict tool_use with forced tool_choice

Anthropic recommends tool_use for guaranteed structured output. Combined with strict: true and a forced tool_choice, Claude must call the specified tool with schema-valid input. This is more reliable than asking Claude to produce JSON in free text.

What Aira sends to Anthropic

response = await client.messages.create(
    model="claude-sonnet-4-6",
    system=system_prompt,
    messages=[{"role": "user", "content": details}],
    tools=[{
        "name": "render_decision",
        "description": "Submit your structured decision assessment.",
        "strict": True,
        "input_schema": {
            "type": "object",
            "properties": {
                "decision": {
                    "type": "string",
                    "enum": ["APPROVE", "DENY", "REVIEW"]
                },
                "confidence": {"type": "number"},
                "key_factors": {
                    "type": "array",
                    "items": {"type": "string"}
                },
                "reasoning": {"type": "string"}
            },
            "required": ["decision", "confidence", "key_factors", "reasoning"],
            "additionalProperties": False
        }
    }],
    tool_choice={"type": "tool", "name": "render_decision"},
    temperature=0.1,
)

System prompt style: XML tags per Anthropic's official recommendation — <task>, <instructions>, <decision_criteria>. Claude parses these unambiguously.

Key settings:

temperature: 0.1 — deterministic for analytical tasks
tool_choice: forced — Claude must use the tool, cannot free-text respond
strict: true — tool input is schema-validated
Note: Assistant message prefilling (starting with {) is deprecated on Claude 4.6+

Google (Gemini 3.1 Flash Lite, Gemini 3.1 Pro)

Method: Controlled generation with response_mime_type + response_schema

Gemini's controlled generation constrains token sampling to produce only valid JSON matching your schema. The propertyOrdering field (Gemini-specific) ensures fields are generated in a deterministic order.

What Aira sends to Google

model = genai.GenerativeModel(
    "gemini-3.1-flash-lite",
    system_instruction=system_prompt,
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema={
            "type": "object",
            "properties": {
                "decision": {
                    "type": "string",
                    "enum": ["APPROVE", "DENY", "REVIEW"]
                },
                "confidence": {"type": "number"},
                "key_factors": {
                    "type": "array",
                    "items": {"type": "string"}
                },
                "reasoning": {"type": "string"}
            },
            "required": ["decision", "confidence", "key_factors", "reasoning"],
            "additionalProperties": False,
            "propertyOrdering": ["decision", "confidence", "key_factors", "reasoning"]
        },
        temperature=0.1,
    ),
)

System prompt style: Concise system instructions with XML tags for structure. Gemini processes system instructions separately with higher behavioral priority.

Key settings:

temperature: 0.1 — deterministic for Gemini 3.1 models
propertyOrdering — ensures fields come in the same order every time
response_mime_type: "application/json" — required alongside schema
Note: response_mime_type and function calling are mutually exclusive in Gemini

The Shared Schema

All three providers enforce the same decision schema:

{
  "decision": "APPROVE | DENY | REVIEW",
  "confidence": 0.0 - 1.0,
  "key_factors": ["factor1", "factor2", "..."],
  "reasoning": "Explanation text"
}

The schema is defined once in providers/prompts.py and adapted to each provider's native format. This ensures consensus scoring works identically regardless of which models are in the case.

Why This Matters for Compliance

Using native schema enforcement means:

No malformed JSON — impossible by construction, not by luck
No invalid decision values — enum constraints prevent hallucinated categories
Deterministic comparison — all models produce the same field structure
Auditable prompts — the exact prompt and schema for each model is stored in the codebase, not generated dynamically

Provider-Specific Prompting

Why Provider-Specific Prompts?

OpenAI (GPT-5.5, GPT-5.4, GPT-5.2, o3)

Anthropic (Claude Sonnet 4.6, Claude Opus 4.6)

Google (Gemini 3.1 Flash Lite, Gemini 3.1 Pro)

The Shared Schema

Why This Matters for Compliance

On this page