Consensus Scoring
How Aira scores agreement between AI models. Deterministic, explainable, no embeddings. Used by both the Cases API and the policy engine's consensus mode.
Where consensus scoring is used
Consensus scoring runs in two places:
- Cases is a standalone API for evaluating a single decision across multiple models. You call it directly when you want a signed record of how several models independently assessed a question.
- Policy engine, consensus mode. The policy engine can run a policy through multiple models at
authorize()time. This is distinct from the Cases API but uses the same scoring algorithm described on this page.
The Cases name stays the same. The policy consensus mode name stays the same. They share machinery, not purpose.
How It Works
Aira does not use embeddings or semantic similarity to score consensus. Instead, each model is prompted to return structured output:
{
"decision": "APPROVE",
"confidence": 0.91,
"key_factors": ["credit score above threshold", "stable income"],
"reasoning": "Full explanation text..."
}Agreement is then scored on these structured fields — not on free-text similarity. This makes consensus deterministic, explainable, and auditable.
The Algorithm
Step 1: Decision Agreement
Count how many models agree on the same decision (APPROVE / DENY / REVIEW):
| Scenario | Disagreement Score Range |
|---|---|
| All models agree | 0.00 |
| Majority agrees (e.g., 2/3) | 0.10 – 0.40 |
| No clear majority | 0.50 – 0.80 |
| Complete disagreement | 0.80 – 1.00 |
Step 2: Confidence Variance
If models agree on the decision but with wildly different confidence levels (e.g., one at 0.95 and another at 0.30), this signals uncertainty. A variance penalty of up to 0.20 is added to the disagreement score.
Step 3: Key Factor Overlap
If models independently identify the same key factors, this strengthens the consensus signal. Each shared factor reduces the disagreement score by up to 0.03 (max 0.10 reduction).
Step 4: Human Review Trigger
Human review is triggered in two cases:
- Models disagree. The disagreement score exceeds the configured threshold (default: 0.4). The decision becomes
MIXED. For Cases, this flags the case for human review. For the policy engine's consensus mode, this flips the policy decision torequire_approval, so the action is held andauthorize()returnspending_approval. - Models vote REVIEW. When the majority decision is
REVIEW, the models themselves are saying "a human should decide this." The decision staysREVIEWbutrequires_human_reviewistrue.
{
"consensus": {
"decision": "MIXED",
"disagreement_score": 0.72,
"requires_human_review": true,
"human_review_reason": "Disagreement score 0.72 exceeds threshold 0.40 — Article 14 human oversight triggered"
}
}{
"consensus": {
"decision": "REVIEW",
"disagreement_score": 0.00,
"requires_human_review": true,
"human_review_reason": "Models assessed this decision as requiring human judgment — Article 14 human oversight triggered"
}
}Configuring the Threshold
Set human_review_threshold per request:
{
"options": {
"human_review_threshold": 0.3
}
}| Threshold | Effect |
|---|---|
| 0.0 | Every decision requires human review |
| 0.2 | Conservative — flags even minor disagreements |
| 0.4 | Default — balanced for most use cases |
| 0.6 | Permissive — only flags significant disagreement |
| 1.0 | Never triggers human review (not recommended) |
For high-stakes domains (credit, medical, legal), use 0.2–0.3. For lower-risk decisions, 0.4–0.6.
Why Not Embeddings?
- Determinism: Structured scoring produces the same result every time. Embedding similarity can vary.
- Explainability: A regulator can audit the scoring logic. "2 out of 3 models said APPROVE" is clearer than "cosine similarity was 0.87."
- No extra costs: No embedding model call needed. Zero additional latency.
- No extra dependencies: No vector database, no embedding API, no model to maintain.
Decision Values
Models must return one of three decisions:
| Decision | Meaning |
|---|---|
APPROVE | Proceed with the action |
DENY | Reject / do not proceed |
REVIEW | Insufficient information or uncertain — needs human judgment |
The consensus decision follows the same values, plus MIXED when human review is triggered.
Human Approval
Hold high-stakes actions for human review at authorize() time, before the agent acts. The receipt is only minted after approval and execution.
Cryptographic Receipts
Receipts are minted by notarize(), not by authorize(). They commit to both the agent's original intent and the real-world outcome.