Provider-Specific Prompting
How Aira optimizes prompts and structured output enforcement for each AI provider.
Why Provider-Specific Prompts?
Each AI provider has a different optimal way to enforce structured output. A generic "respond in JSON" prompt is unreliable. Aira uses each provider's native schema enforcement at the decoding level — not just prompt instructions.
This means the JSON structure is guaranteed by the model's token sampling, not by hoping the model follows instructions.
OpenAI (GPT-5.4, GPT-5-mini)
Method: Structured Outputs with json_schema + strict: true
OpenAI's Structured Outputs feature uses constrained decoding to guarantee the output matches your JSON Schema exactly. The model literally cannot emit tokens that violate the schema.
response = await client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": details},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "decision_response",
"strict": True,
"schema": {
"type": "object",
"properties": {
"decision": {
"type": "string",
"enum": ["APPROVE", "DENY", "REVIEW"]
},
"confidence": {"type": "number"},
"key_factors": {
"type": "array",
"items": {"type": "string"}
},
"reasoning": {"type": "string"}
},
"required": ["decision", "confidence", "key_factors", "reasoning"],
"additionalProperties": False
}
}
},
temperature=0.1,
)System prompt style: Markdown headers following OpenAI's recommended structure — Role, Task, Constraints, Output format.
Key settings:
temperature: 0.1— near-deterministic for decision tasksmax_tokens: 1000— generous for small decision objects- Refusal handling: checks
message.refusalfield for safety filter blocks
Anthropic (Claude Sonnet 4.6, Claude Opus 4.6)
Method: Strict tool_use with forced tool_choice
Anthropic recommends tool_use for guaranteed structured output. Combined with strict: true and a forced tool_choice, Claude must call the specified tool with schema-valid input. This is more reliable than asking Claude to produce JSON in free text.
response = await client.messages.create(
model="claude-sonnet-4-6-20260114",
system=system_prompt,
messages=[{"role": "user", "content": details}],
tools=[{
"name": "render_decision",
"description": "Submit your structured decision assessment.",
"strict": True,
"input_schema": {
"type": "object",
"properties": {
"decision": {
"type": "string",
"enum": ["APPROVE", "DENY", "REVIEW"]
},
"confidence": {"type": "number"},
"key_factors": {
"type": "array",
"items": {"type": "string"}
},
"reasoning": {"type": "string"}
},
"required": ["decision", "confidence", "key_factors", "reasoning"],
"additionalProperties": False
}
}],
tool_choice={"type": "tool", "name": "render_decision"},
temperature=0.1,
)System prompt style: XML tags per Anthropic's official recommendation — <task>, <instructions>, <decision_criteria>. Claude parses these unambiguously.
Key settings:
temperature: 0.1— deterministic for analytical taskstool_choice: forced— Claude must use the tool, cannot free-text respondstrict: true— tool input is schema-validated- Note: Assistant message prefilling (starting with
{) is deprecated on Claude 4.6+
Google (Gemini 3.1 Flash Lite, Gemini 3.1 Pro)
Method: Controlled generation with response_mime_type + response_schema
Gemini's controlled generation constrains token sampling to produce only valid JSON matching your schema. The propertyOrdering field (Gemini-specific) ensures fields are generated in a deterministic order.
model = genai.GenerativeModel(
"gemini-3.1-flash-lite",
system_instruction=system_prompt,
generation_config=genai.GenerationConfig(
response_mime_type="application/json",
response_schema={
"type": "object",
"properties": {
"decision": {
"type": "string",
"enum": ["APPROVE", "DENY", "REVIEW"]
},
"confidence": {"type": "number"},
"key_factors": {
"type": "array",
"items": {"type": "string"}
},
"reasoning": {"type": "string"}
},
"required": ["decision", "confidence", "key_factors", "reasoning"],
"additionalProperties": False,
"propertyOrdering": ["decision", "confidence", "key_factors", "reasoning"]
},
temperature=0.1,
),
)System prompt style: Concise system instructions with XML tags for structure. Gemini processes system instructions separately with higher behavioral priority.
Key settings:
temperature: 0.1— deterministic for Gemini 3.1 modelspropertyOrdering— ensures fields come in the same order every timeresponse_mime_type: "application/json"— required alongside schema- Note:
response_mime_typeand function calling are mutually exclusive in Gemini
The Shared Schema
All three providers enforce the same decision schema:
{
"decision": "APPROVE | DENY | REVIEW",
"confidence": 0.0 - 1.0,
"key_factors": ["factor1", "factor2", "..."],
"reasoning": "Explanation text"
}The schema is defined once in providers/prompts.py and adapted to each provider's native format. This ensures consensus scoring works identically regardless of which models are in the case.
Why This Matters for Compliance
Using native schema enforcement means:
- No malformed JSON — impossible by construction, not by luck
- No invalid decision values —
enumconstraints prevent hallucinated categories - Deterministic comparison — all models produce the same field structure
- Auditable prompts — the exact prompt and schema for each model is stored in the codebase, not generated dynamically