Batch Processing & Throughput

Patterns for high-volume authorize and notarize calls — concurrency, offline queues, rate limit handling, and code examples.

When you need batch processing

Most agents call Aira once per action — authorize before the LLM call, notarize after. But some workloads generate hundreds or thousands of actions in a short window:

Nightly batch jobs that process a queue of tasks
Data pipelines that fan out across many documents
Load testing or migration scripts

This guide covers patterns to maximize throughput while staying within rate limits.

Concurrent authorize calls

The fastest way to process many actions is to run authorize calls concurrently. Both SDKs are async-friendly.

Python (asyncio.gather)

import asyncio
from aira import AsyncAira

client = AsyncAira(api_key="aira_live_...")

async def authorize_batch(items: list[dict]) -> list:
    """Authorize many actions concurrently."""
    tasks = [
        client.authorize(
            action_type="llm_call",
            model_id="claude-sonnet-4-6",
            details=item,
        )
        for item in items
    ]
    return await asyncio.gather(*tasks)

async def main():
    items = [{"prompt": f"Analyze document {i}"} for i in range(100)]
    actions = await authorize_batch(items)
    print(f"Authorized {len(actions)} actions")

asyncio.run(main())

TypeScript (Promise.all)

import { Aira } from "aira-sdk";

const aira = new Aira({ apiKey: "aira_live_..." });

async function authorizeBatch(items: Array<{ prompt: string }>) {
  const promises = items.map((item) =>
    client.authorize({
      actionType: "llm_call",
      modelId: "claude-sonnet-4-6",
      details: item,
    })
  );
  return Promise.all(promises);
}

const items = Array.from({ length: 100 }, (_, i) => ({
  prompt: `Analyze document ${i}`,
}));

const actions = await authorizeBatch(items);
console.log(`Authorized ${actions.length} actions`);

For very large batches (1,000+), chunk your requests into groups of 50-100 to avoid overwhelming your connection pool. See the "Chunked concurrency" section below.

Chunked concurrency

When processing thousands of items, limit how many requests are in flight at once:

Python (semaphore)

import asyncio
from aira import AsyncAira

client = AsyncAira(api_key="aira_live_...")
MAX_CONCURRENT = 50

async def authorize_with_limit(sem: asyncio.Semaphore, item: dict):
    async with sem:
        return await client.authorize(
            action_type="llm_call",
            model_id="claude-sonnet-4-6",
            details=item,
        )

async def main():
    items = [{"prompt": f"Process item {i}"} for i in range(2000)]
    sem = asyncio.Semaphore(MAX_CONCURRENT)
    actions = await asyncio.gather(
        *[authorize_with_limit(sem, item) for item in items]
    )
    print(f"Authorized {len(actions)} actions")

asyncio.run(main())

TypeScript (p-limit)

import { Aira } from "aira-sdk";
import pLimit from "p-limit";

const aira = new Aira({ apiKey: "aira_live_..." });
const limit = pLimit(50); // max 50 concurrent requests

const items = Array.from({ length: 2000 }, (_, i) => ({
  prompt: `Process item ${i}`,
}));

const actions = await Promise.all(
  items.map((item) =>
    limit(() =>
      client.authorize({
        actionType: "llm_call",
        modelId: "claude-sonnet-4-6",
        details: item,
      })
    )
  )
);

console.log(`Authorized ${actions.length} actions`);

Offline queue mode

Both SDKs support an offline queue for fire-and-forget scenarios. Actions are buffered locally and flushed to the API in the background. This is useful when:

You do not want authorize latency on the critical path.
Your agent can tolerate eventual consistency (the receipt arrives later).
You are running in an environment with intermittent connectivity.

from aira import Aira

client = Aira(
    api_key="aira_live_...",
    offline=True,       # enable local buffering
    flush_interval=5,         # flush every 5 seconds
    max_queue_size=1000,      # buffer up to 1,000 actions
)

# This returns immediately — the action is queued locally
action = client.authorize(
    action_type="llm_call",
    model_id="claude-sonnet-4-6",
    details={"prompt": "Summarize this document"},
)

# When you're done, flush remaining items
client.sync()

import { Aira } from "aira-sdk";

const aira = new Aira({
  apiKey: "aira_live_...",
  offline: true,
  flushInterval: 5000,   // flush every 5 seconds
  maxQueueSize: 1000,
});

// Returns immediately — queued locally
const action = await client.authorize({
  actionType: "llm_call",
  modelId: "claude-sonnet-4-6",
  details: { prompt: "Summarize this document" },
});

// Flush remaining items before shutdown
await client.sync();

Offline queue mode means your agent proceeds without waiting for Aira's policy decision. If a policy would have denied the action, you will only learn about it after the fact (via webhook or polling). Use this mode only when you accept that trade-off.

Rate limit management

Aira enforces per-key rate limits. When you exceed the limit, the API returns HTTP 429 with a Retry-After header.

Built-in SDK retries

Both SDKs automatically retry on 429 with exponential backoff. Configure the retry behavior:

client = Aira(
    api_key="aira_live_...",
    max_retries=5,            # up to 5 retries on 429/5xx
)

const aira = new Aira({
  apiKey: "aira_live_...",
  maxRetries: 5,
});

Manual backoff (if not using the SDK)

import time
import requests

def authorize_with_backoff(payload: dict, max_retries: int = 5):
    for attempt in range(max_retries):
        resp = requests.post(
            "https://your-domain/api/v1/actions",
            json=payload,
            headers={"Authorization": "Bearer aira_live_..."},
        )
        if resp.status_code != 429:
            return resp.json()

        wait = float(resp.headers.get("Retry-After", 2 ** attempt))
        print(f"Rate limited. Retrying in {wait}s...")
        time.sleep(wait)

    raise Exception("Max retries exceeded")

Performance reference

On a single 4-vCPU Aira instance, expect:

Metric	Throughput
Authorize requests	~200/s
Receipt verifications	~100/s
Full receipt mint (in-process)	~10,500/s

For detailed benchmarks, see the Performance guide.

Summary

Pattern	Best for	Trade-off
`asyncio.gather` / `Promise.all`	Medium batches (10-100)	Simple, but can spike connections
Semaphore / p-limit	Large batches (100-10,000)	Controlled concurrency
Offline queue	Fire-and-forget	No pre-flight policy enforcement
SDK retries	All workloads	Handles 429 automatically

Batch Processing & Throughput

On this page