Aira

Content Scan Policies

Regex pattern libraries that catch PII, leaked credentials, and prompt injection attempts before the action executes. No LLM call, no extra latency.

What it is

content_scan is the fourth Aira policy mode, alongside rules, ai, and consensus. It runs in-process before the action executes and matches the action's details field against curated pattern libraries plus optional org-specific custom regex.

Severity decides the verdict:

SeverityVerdict
criticalDeny — the action is blocked
warningRequire approval — the action is held for human review
infoAllow — the hit is logged but the action proceeds

The scanner is regex-based and pure-Python. It never calls an external service and never logs the matched secret in plaintext — every hit is redacted to first/last 4 characters before it touches an audit row, a webhook payload, or a UI surface.


Built-in libraries

pii

PatternSeverity
US Social Security Numbercritical
IBAN bank account numbercritical
US passport numbercritical
Credit card (Luhn-checked)critical
Email addresswarning
International phone numberwarning
IPv4 addressinfo
IPv6 addressinfo

Credit cards are filtered through a Luhn checksum, so 4111111111111112 (a real-looking but invalid number) does not match. This eliminates the most common false-positive in PII scanning.

credentials

PatternSeverity
AWS access key id (AKIA/ASIA/AROA/AIDA...)critical
AWS secret key candidate (40-char base64)critical
GitHub PAT (ghp_/gho_/ghs_/ghu_/ghr_)critical
GitLab PAT (glpat-)critical
Slack token (xoxa-/xoxb-/xoxp-/xoxr-/xoxs-)critical
Stripe secret key (sk_live_/sk_test_/rk_live_/rk_test_)critical
Google API key (AIza...)critical
Azure storage account keycritical
PEM private key headercritical
Basic-auth URL (postgres://user:pass@...)critical
JSON Web Token (eyJ...)warning
Generic api_key/secret/password/token assignmentwarning

prompt_injection

PatternSeverity
Ignore-previous-instructions style overridewarning
Role-switch / jailbreak ("you are now...")warning
Embedded system: / assistant: role markerswarning
DAN / dev-mode / sudo mode invocationscritical
System-prompt or secret exfiltration attemptswarning
Encoded payload smuggling markers ("base64 decode...")info
Tool elevation attempts ("call X as admin")warning

The pattern set is curated to minimize false positives on benign user content. If you need stricter matching, use the custom patterns field on the policy.


Create a content_scan policy

Dashboard

Dashboard → Policies → New policy. Pick the Content scan mode. Toggle the libraries you want, optionally add custom regex rows, then save.

API

curl -X POST https://api.airaproof.com/api/v1/policies \
  -H "Authorization: Bearer aira_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Block PII in customer messages",
    "mode": "content_scan",
    "priority": 100,
    "scan_config": {
      "libraries": ["pii", "credentials"],
      "custom_patterns": [
        {
          "name": "internal_code",
          "regex": "INTERNAL-CODE-\\d+",
          "severity": "critical",
          "description": "Org-internal classification codes"
        }
      ]
    },
    "decision": "deny"
  }'

scan_config requires at least one library OR one custom pattern; an empty config is rejected with HTTP 422.


How it integrates with the policy engine

A content_scan policy is just another row in the priority-ordered policy list. When authorize() runs:

  1. The action's details field (or its JSON serialization, if it's a dict) is fed through the scanner.
  2. Every pattern in every enabled library plus every custom regex is matched.
  3. The worst severity in the hit list decides the verdict (critical → deny, warning → require_approval, info → allow).
  4. The scanner's hit list (with redacted samples) is persisted in the PolicyEvaluation.model_votes JSON column so the dashboard can render the evaluation later without re-scanning.

A deny from a high-priority content_scan policy stops the rest of the policy chain, exactly like a deny from any other mode.


What gets logged

For an SSN match in a customer message, the persisted model_votes looks like:

{
  "scan": [
    {
      "name": "us_ssn",
      "library": "pii",
      "severity": "critical",
      "matches": 1,
      "sample": "123...6789"
    }
  ],
  "worst_severity": "critical"
}

The sample is redacted. The full secret never appears in the database, in audit rows, in webhook payloads, or in the dashboard. There is no flag to disable redaction — if you need the full plaintext for forensics, you have to fetch it from the original action's details_storage_key (which is itself encrypted at rest).


When to use content_scan vs ai vs rules

Use caseRecommended mode
"Block any action that contains an AWS key"content_scan (zero latency, deterministic)
"Block actions involving customer PII or financial data over €5,000"ai (context-dependent judgement)
"Block every wire transfer above 10k"rules (deterministic field match)
"Multi-model vote on a loan decision"consensus

content_scan is the right answer whenever the rule is "this byte sequence must never appear." It is faster and cheaper than ai mode, deterministic, and produces auditable output without an LLM round-trip.

On this page