Aira

Sanitize API

Scan, redact, tokenize, or block sensitive content in text and files. Supports PII, PHI, credentials, and healthcare-specific entities.

The Sanitize API is Aira's standalone content-cleaning pipeline. Pass in text or upload a file (image, PDF, DICOM), choose a policy and mode, and get back clean output with a full findings report.

For a conceptual overview, see the Sanitize guide. For the difference between this and content scan policies, see Sanitize vs Content Scan Policies.

Base URL

POST /api/v1/sanitize

All sanitize endpoints require a valid API key or JWT.


Sanitize text

Scan and process a text string.

POST /api/v1/sanitize

Request body

FieldTypeRequiredDefaultDescription
contentstringYesText to sanitize (max 500,000 chars)
policystringNo"default"Policy pack: default, hipaa, pci, legal
modestringNo"redact"One of redact, tokenize, block, flag
ai_modelstringNonullModel ID for AI-assisted second-pass review

Modes

ModeBehavior
redactReplace detected entities with [REDACTED]
tokenizeReplace with reversible tokens like <PERSON_001>, return a mapping
blockIf sensitive content is found, return blocked: true with empty output
flagScan only — return findings without modifying the content

Policy packs

PackLibrariesUse case
defaultpii, credentials, prompt_injectionGeneral-purpose scanning
hipaapii, credentials, prompt_injection, healthcarePHI: MRNs, diagnoses, dates, providers
pcipii, credentialsCard numbers, account data
legalpii, credentialsNames, emails, case-adjacent PII

Example

curl -X POST https://api.airaproof.com/api/v1/sanitize \
  -H "Authorization: Bearer $AIRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Patient John Smith, SSN 123-45-6789, was admitted on 2024-03-15.",
    "policy": "hipaa",
    "mode": "redact"
  }'
from aira import Aira

client = Aira(api_key="aira_live_...")

result = client.sanitize(
    content="Patient John Smith, SSN 123-45-6789, was admitted on 2024-03-15.",
    policy="hipaa",
    mode="redact",
)

print(result.clean)
# Patient [REDACTED], SSN [REDACTED], was admitted on [REDACTED].

print(result.findings)
# [Finding(entity_type='ner_person', severity='warning', count=1), ...]
import { Aira } from "aira-sdk";

const aira = new Aira({ apiKey: "aira_live_..." });

const result = await aira.sanitize({
  content: "Patient John Smith, SSN 123-45-6789, was admitted on 2024-03-15.",
  policy: "hipaa",
  mode: "redact",
});

console.log(result.clean);
// Patient [REDACTED], SSN [REDACTED], was admitted on [REDACTED].

Response

{
  "clean": "Patient [REDACTED], SSN [REDACTED], was admitted on [REDACTED].",
  "blocked": false,
  "mode": "redact",
  "policy": "hipaa",
  "input_hash": "sha256:abc123...",
  "output_hash": "sha256:def456...",
  "findings": [
    {
      "entity_type": "ner_person",
      "severity": "warning",
      "action_taken": "redacted",
      "library": "pii_ner",
      "description": "NER: PERSON (score 0.95)",
      "count": 1
    },
    {
      "entity_type": "us_ssn",
      "severity": "critical",
      "action_taken": "redacted",
      "library": "pii",
      "description": "US Social Security Number",
      "count": 1
    },
    {
      "entity_type": "ner_date_time",
      "severity": "warning",
      "action_taken": "redacted",
      "library": "pii_ner",
      "description": "NER: DATE_TIME (score 0.85)",
      "count": 1
    }
  ],
  "token_mapping": null,
  "request_id": "req_abc123"
}

Tokenize mode response

When mode is "tokenize", the response includes reversible tokens:

{
  "clean": "Patient <NER_PERSON_001>, SSN <US_SSN_001>, was admitted on <NER_DATE_TIME_001>.",
  "token_mapping": {
    "<NER_PERSON_001>": "John Smith",
    "<US_SSN_001>": "123-45-6789",
    "<NER_DATE_TIME_001>": "2024-03-15"
  },
  "findings": [...]
}

Sanitize text (test mode)

Dry-run sanitization with no audit logging, no receipts, no database writes. Use this for testing policies before production.

POST /api/v1/sanitize/test

Same request/response shape as POST /api/v1/sanitize, minus the ai_model field.


Detokenize

Reverse tokenization — restore original entities from a token mapping returned by a previous tokenize call.

POST /api/v1/sanitize/detokenize

Request body

FieldTypeRequiredDescription
contentstringYesTokenized text to reverse
token_mappingobjectYesToken-to-original mapping from the sanitize response

Example

curl -X POST https://api.airaproof.com/api/v1/sanitize/detokenize \
  -H "Authorization: Bearer $AIRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Patient <NER_PERSON_001>, SSN <US_SSN_001>.",
    "token_mapping": {
      "<NER_PERSON_001>": "John Smith",
      "<US_SSN_001>": "123-45-6789"
    }
  }'

Response

{
  "content": "Patient John Smith, SSN 123-45-6789.",
  "request_id": "req_def456"
}

Sanitize file

Upload an image, PDF, or DICOM file for scanning and optional redaction.

POST /api/v1/sanitize/file
Content-Type: multipart/form-data

Form fields

FieldTypeRequiredDefaultDescription
filefileYesImage (JPEG, PNG, GIF, BMP, TIFF), PDF, or DICOM
policystringNo"default"Policy pack
modestringNo"redact"redact, tokenize, block, flag
ai_modelstringNonullModel ID for AI-assisted review
include_pixel_redactionboolNofalseEnable pixel-level redaction for images and DICOM

Max file size: 50 MB

File type detection

Files are identified by magic bytes, not the Content-Type header or extension. This prevents file-type spoofing.

TypeMagic signature
JPEG\xff\xd8\xff
PNG\x89PNG\r\n\x1a\n
PDF%PDF
DICOMDICM at byte offset 128
GIFGIF87a / GIF89a
BMPBM
TIFFII*\0 / MM\0*

Mode behavior per file type

ModeImagePDFDICOM
redactPresidio pixel-level redactionPyMuPDF in-place redactionPS3.15 Annex E metadata + optional pixel redaction
tokenizePixel redaction + token mappingIn-place redaction + token mappingPS3.15 de-identification + token mapping
blockReject if PII foundReject if PII foundReject if PII found
flagScan only, return findingsScan only, return findingsScan only, return findings

Example

curl -X POST https://api.airaproof.com/api/v1/sanitize/file \
  -H "Authorization: Bearer $AIRA_API_KEY" \
  -F "file=@patient-record.pdf" \
  -F "policy=hipaa" \
  -F "mode=redact"

Response

{
  "file_type": "pdf",
  "original_filename": "patient-record.pdf",
  "findings": [
    {
      "entity_type": "ner_person",
      "severity": "warning",
      "action_taken": "redacted",
      "library": "pii_ner",
      "description": "NER: PERSON (score 0.92)",
      "count": 3
    },
    {
      "entity_type": "us_ssn",
      "severity": "critical",
      "action_taken": "redacted",
      "library": "pii",
      "description": "US Social Security Number",
      "count": 1
    }
  ],
  "blocked": false,
  "mode": "redact",
  "policy": "hipaa",
  "input_hash": "sha256:abc...",
  "output_hash": "sha256:def...",
  "download_token": "a1b2c3d4...",
  "download_url": "https://api.airaproof.com/api/v1/sanitize/file/a1b2c3d4.../download",
  "tokenized_text": "Patient [REDACTED]\\nSSN: [REDACTED]\\n...",
  "token_mapping": null,
  "dicom_tag_actions": null,
  "pixel_redactions": null,
  "request_id": "req_xyz789"
}

DICOM response (redact mode)

DICOM files include additional metadata about de-identification:

{
  "file_type": "dicom",
  "dicom_tag_actions": [
    { "tag_name": "PatientName", "tag_number": "(0010,0010)", "action": "removed", "original_hash": "sha256:..." },
    { "tag_name": "PatientID", "tag_number": "(0010,0020)", "action": "removed", "original_hash": "sha256:..." },
    { "tag_name": "PatientBirthDate", "tag_number": "(0010,0030)", "action": "removed", "original_hash": "sha256:..." }
  ],
  "pixel_redactions": [
    { "text_found": "John Smith", "bounding_box": [100, 50, 250, 80], "confidence": 0.91 }
  ]
}

Download sanitized file

Download the cleaned file using the one-time token from the sanitize response.

GET /api/v1/sanitize/file/{token}/download

No authentication required — the token itself is the authorization.

  • Tokens expire after 1 hour
  • Tokens are single-use — the file is deleted after the first download
  • The response filename is sanitized_<original_name>.<ext>
  • The Content-Type header matches the original file type

Example

curl -o sanitized-record.pdf \
  https://api.airaproof.com/api/v1/sanitize/file/a1b2c3d4.../download

Findings object

Every sanitize response includes a findings array:

FieldTypeDescription
entity_typestringWhat was detected (e.g., us_ssn, ner_person, mrn_pattern)
severitystringcritical, warning, or info
action_takenstringredacted, tokenized, blocked, flagged
librarystringWhich scanner found it (pii, credentials, pii_ner, healthcare)
descriptionstringHuman-readable description
countintegerHow many instances of this entity type were found

Severity behavior

SeverityRedact/TokenizeBlockFlag
criticalReplacedBlockedFlagged
warningReplacedBlockedFlagged
infoNot replaced (flagged only)BlockedFlagged

Info-severity entities (IP addresses, URLs, organization names) are reported in findings but never redacted or tokenized. This prevents false positives like "HDL Cholesterol" being blacked out.


Error responses

StatusCodeDescription
413File exceeds 50 MB
415Unsupported file type
422Empty file, corrupt file, or missing dependency
422OUTPUT_SCAN_VIOLATIONBlock mode triggered

On this page