Batch Processing & Throughput
Patterns for high-volume authorize and notarize calls — concurrency, offline queues, rate limit handling, and code examples.
When you need batch processing
Most agents call Aira once per action — authorize before the LLM call, notarize after. But some workloads generate hundreds or thousands of actions in a short window:
- Nightly batch jobs that process a queue of tasks
- Data pipelines that fan out across many documents
- Load testing or migration scripts
This guide covers patterns to maximize throughput while staying within rate limits.
Concurrent authorize calls
The fastest way to process many actions is to run authorize calls concurrently. Both SDKs are async-friendly.
Python (asyncio.gather)
import asyncio
from aira import AsyncAira
client = AsyncAira(api_key="aira_live_...")
async def authorize_batch(items: list[dict]) -> list:
"""Authorize many actions concurrently."""
tasks = [
client.authorize(
action_type="llm_call",
model_id="claude-sonnet-4-6",
details=item,
)
for item in items
]
return await asyncio.gather(*tasks)
async def main():
items = [{"prompt": f"Analyze document {i}"} for i in range(100)]
actions = await authorize_batch(items)
print(f"Authorized {len(actions)} actions")
asyncio.run(main())TypeScript (Promise.all)
import { Aira } from "aira-sdk";
const aira = new Aira({ apiKey: "aira_live_..." });
async function authorizeBatch(items: Array<{ prompt: string }>) {
const promises = items.map((item) =>
client.authorize({
actionType: "llm_call",
modelId: "claude-sonnet-4-6",
details: item,
})
);
return Promise.all(promises);
}
const items = Array.from({ length: 100 }, (_, i) => ({
prompt: `Analyze document ${i}`,
}));
const actions = await authorizeBatch(items);
console.log(`Authorized ${actions.length} actions`);For very large batches (1,000+), chunk your requests into groups of 50-100 to avoid overwhelming your connection pool. See the "Chunked concurrency" section below.
Chunked concurrency
When processing thousands of items, limit how many requests are in flight at once:
Python (semaphore)
import asyncio
from aira import AsyncAira
client = AsyncAira(api_key="aira_live_...")
MAX_CONCURRENT = 50
async def authorize_with_limit(sem: asyncio.Semaphore, item: dict):
async with sem:
return await client.authorize(
action_type="llm_call",
model_id="claude-sonnet-4-6",
details=item,
)
async def main():
items = [{"prompt": f"Process item {i}"} for i in range(2000)]
sem = asyncio.Semaphore(MAX_CONCURRENT)
actions = await asyncio.gather(
*[authorize_with_limit(sem, item) for item in items]
)
print(f"Authorized {len(actions)} actions")
asyncio.run(main())TypeScript (p-limit)
import { Aira } from "aira-sdk";
import pLimit from "p-limit";
const aira = new Aira({ apiKey: "aira_live_..." });
const limit = pLimit(50); // max 50 concurrent requests
const items = Array.from({ length: 2000 }, (_, i) => ({
prompt: `Process item ${i}`,
}));
const actions = await Promise.all(
items.map((item) =>
limit(() =>
client.authorize({
actionType: "llm_call",
modelId: "claude-sonnet-4-6",
details: item,
})
)
)
);
console.log(`Authorized ${actions.length} actions`);Offline queue mode
Both SDKs support an offline queue for fire-and-forget scenarios. Actions are buffered locally and flushed to the API in the background. This is useful when:
- You do not want authorize latency on the critical path.
- Your agent can tolerate eventual consistency (the receipt arrives later).
- You are running in an environment with intermittent connectivity.
from aira import Aira
client = Aira(
api_key="aira_live_...",
offline=True, # enable local buffering
flush_interval=5, # flush every 5 seconds
max_queue_size=1000, # buffer up to 1,000 actions
)
# This returns immediately — the action is queued locally
action = client.authorize(
action_type="llm_call",
model_id="claude-sonnet-4-6",
details={"prompt": "Summarize this document"},
)
# When you're done, flush remaining items
client.sync()import { Aira } from "aira-sdk";
const aira = new Aira({
apiKey: "aira_live_...",
offline: true,
flushInterval: 5000, // flush every 5 seconds
maxQueueSize: 1000,
});
// Returns immediately — queued locally
const action = await client.authorize({
actionType: "llm_call",
modelId: "claude-sonnet-4-6",
details: { prompt: "Summarize this document" },
});
// Flush remaining items before shutdown
await client.sync();Offline queue mode means your agent proceeds without waiting for Aira's policy decision. If a policy would have denied the action, you will only learn about it after the fact (via webhook or polling). Use this mode only when you accept that trade-off.
Rate limit management
Aira enforces per-key rate limits. When you exceed the limit, the API returns HTTP 429 with a Retry-After header.
Built-in SDK retries
Both SDKs automatically retry on 429 with exponential backoff. Configure the retry behavior:
client = Aira(
api_key="aira_live_...",
max_retries=5, # up to 5 retries on 429/5xx
)const aira = new Aira({
apiKey: "aira_live_...",
maxRetries: 5,
});Manual backoff (if not using the SDK)
import time
import requests
def authorize_with_backoff(payload: dict, max_retries: int = 5):
for attempt in range(max_retries):
resp = requests.post(
"https://your-domain/api/v1/actions",
json=payload,
headers={"Authorization": "Bearer aira_live_..."},
)
if resp.status_code != 429:
return resp.json()
wait = float(resp.headers.get("Retry-After", 2 ** attempt))
print(f"Rate limited. Retrying in {wait}s...")
time.sleep(wait)
raise Exception("Max retries exceeded")Performance reference
On a single 4-vCPU Aira instance, expect:
| Metric | Throughput |
|---|---|
| Authorize requests | ~200/s |
| Receipt verifications | ~100/s |
| Full receipt mint (in-process) | ~10,500/s |
For detailed benchmarks, see the Performance guide.
Summary
| Pattern | Best for | Trade-off |
|---|---|---|
asyncio.gather / Promise.all | Medium batches (10-100) | Simple, but can spike connections |
| Semaphore / p-limit | Large batches (100-10,000) | Controlled concurrency |
| Offline queue | Fire-and-forget | No pre-flight policy enforcement |
| SDK retries | All workloads | Handles 429 automatically |