Sanitize API
Scan, redact, tokenize, or block sensitive content in text and files. Supports PII, PHI, credentials, and healthcare-specific entities.
The Sanitize API is Aira's standalone content-cleaning pipeline. Pass in text or upload a file (image, PDF, DICOM), choose a policy and mode, and get back clean output with a full findings report.
For a conceptual overview, see the Sanitize guide. For the difference between this and content scan policies, see Sanitize vs Content Scan Policies.
Base URL
POST /api/v1/sanitizeAll sanitize endpoints require a valid API key or JWT.
Sanitize text
Scan and process a text string.
POST /api/v1/sanitizeRequest body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
content | string | Yes | Text to sanitize (max 500,000 chars) | |
policy | string | No | "default" | Policy pack: default, hipaa, pci, legal |
mode | string | No | "redact" | One of redact, tokenize, block, flag |
ai_model | string | No | null | Model ID for AI-assisted second-pass review |
Modes
| Mode | Behavior |
|---|---|
| redact | Replace detected entities with [REDACTED] |
| tokenize | Replace with reversible tokens like <PERSON_001>, return a mapping |
| block | If sensitive content is found, return blocked: true with empty output |
| flag | Scan only — return findings without modifying the content |
Policy packs
| Pack | Libraries | Use case |
|---|---|---|
default | pii, credentials, prompt_injection | General-purpose scanning |
hipaa | pii, credentials, prompt_injection, healthcare | PHI: MRNs, diagnoses, dates, providers |
pci | pii, credentials | Card numbers, account data |
legal | pii, credentials | Names, emails, case-adjacent PII |
Example
curl -X POST https://api.airaproof.com/api/v1/sanitize \
-H "Authorization: Bearer $AIRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Patient John Smith, SSN 123-45-6789, was admitted on 2024-03-15.",
"policy": "hipaa",
"mode": "redact"
}'from aira import Aira
client = Aira(api_key="aira_live_...")
result = client.sanitize(
content="Patient John Smith, SSN 123-45-6789, was admitted on 2024-03-15.",
policy="hipaa",
mode="redact",
)
print(result.clean)
# Patient [REDACTED], SSN [REDACTED], was admitted on [REDACTED].
print(result.findings)
# [Finding(entity_type='ner_person', severity='warning', count=1), ...]import { Aira } from "aira-sdk";
const aira = new Aira({ apiKey: "aira_live_..." });
const result = await aira.sanitize({
content: "Patient John Smith, SSN 123-45-6789, was admitted on 2024-03-15.",
policy: "hipaa",
mode: "redact",
});
console.log(result.clean);
// Patient [REDACTED], SSN [REDACTED], was admitted on [REDACTED].Response
{
"clean": "Patient [REDACTED], SSN [REDACTED], was admitted on [REDACTED].",
"blocked": false,
"mode": "redact",
"policy": "hipaa",
"input_hash": "sha256:abc123...",
"output_hash": "sha256:def456...",
"findings": [
{
"entity_type": "ner_person",
"severity": "warning",
"action_taken": "redacted",
"library": "pii_ner",
"description": "NER: PERSON (score 0.95)",
"count": 1
},
{
"entity_type": "us_ssn",
"severity": "critical",
"action_taken": "redacted",
"library": "pii",
"description": "US Social Security Number",
"count": 1
},
{
"entity_type": "ner_date_time",
"severity": "warning",
"action_taken": "redacted",
"library": "pii_ner",
"description": "NER: DATE_TIME (score 0.85)",
"count": 1
}
],
"token_mapping": null,
"request_id": "req_abc123"
}Tokenize mode response
When mode is "tokenize", the response includes reversible tokens:
{
"clean": "Patient <NER_PERSON_001>, SSN <US_SSN_001>, was admitted on <NER_DATE_TIME_001>.",
"token_mapping": {
"<NER_PERSON_001>": "John Smith",
"<US_SSN_001>": "123-45-6789",
"<NER_DATE_TIME_001>": "2024-03-15"
},
"findings": [...]
}Sanitize text (test mode)
Dry-run sanitization with no audit logging, no receipts, no database writes. Use this for testing policies before production.
POST /api/v1/sanitize/testSame request/response shape as POST /api/v1/sanitize, minus the ai_model field.
Detokenize
Reverse tokenization — restore original entities from a token mapping returned by a previous tokenize call.
POST /api/v1/sanitize/detokenizeRequest body
| Field | Type | Required | Description |
|---|---|---|---|
content | string | Yes | Tokenized text to reverse |
token_mapping | object | Yes | Token-to-original mapping from the sanitize response |
Example
curl -X POST https://api.airaproof.com/api/v1/sanitize/detokenize \
-H "Authorization: Bearer $AIRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Patient <NER_PERSON_001>, SSN <US_SSN_001>.",
"token_mapping": {
"<NER_PERSON_001>": "John Smith",
"<US_SSN_001>": "123-45-6789"
}
}'Response
{
"content": "Patient John Smith, SSN 123-45-6789.",
"request_id": "req_def456"
}Sanitize file
Upload an image, PDF, or DICOM file for scanning and optional redaction.
POST /api/v1/sanitize/file
Content-Type: multipart/form-dataForm fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
file | file | Yes | Image (JPEG, PNG, GIF, BMP, TIFF), PDF, or DICOM | |
policy | string | No | "default" | Policy pack |
mode | string | No | "redact" | redact, tokenize, block, flag |
ai_model | string | No | null | Model ID for AI-assisted review |
include_pixel_redaction | bool | No | false | Enable pixel-level redaction for images and DICOM |
Max file size: 50 MB
File type detection
Files are identified by magic bytes, not the Content-Type header or extension. This prevents file-type spoofing.
| Type | Magic signature |
|---|---|
| JPEG | \xff\xd8\xff |
| PNG | \x89PNG\r\n\x1a\n |
%PDF | |
| DICOM | DICM at byte offset 128 |
| GIF | GIF87a / GIF89a |
| BMP | BM |
| TIFF | II*\0 / MM\0* |
Mode behavior per file type
| Mode | Image | DICOM | |
|---|---|---|---|
| redact | Presidio pixel-level redaction | PyMuPDF in-place redaction | PS3.15 Annex E metadata + optional pixel redaction |
| tokenize | Pixel redaction + token mapping | In-place redaction + token mapping | PS3.15 de-identification + token mapping |
| block | Reject if PII found | Reject if PII found | Reject if PII found |
| flag | Scan only, return findings | Scan only, return findings | Scan only, return findings |
Example
curl -X POST https://api.airaproof.com/api/v1/sanitize/file \
-H "Authorization: Bearer $AIRA_API_KEY" \
-F "file=@patient-record.pdf" \
-F "policy=hipaa" \
-F "mode=redact"Response
{
"file_type": "pdf",
"original_filename": "patient-record.pdf",
"findings": [
{
"entity_type": "ner_person",
"severity": "warning",
"action_taken": "redacted",
"library": "pii_ner",
"description": "NER: PERSON (score 0.92)",
"count": 3
},
{
"entity_type": "us_ssn",
"severity": "critical",
"action_taken": "redacted",
"library": "pii",
"description": "US Social Security Number",
"count": 1
}
],
"blocked": false,
"mode": "redact",
"policy": "hipaa",
"input_hash": "sha256:abc...",
"output_hash": "sha256:def...",
"download_token": "a1b2c3d4...",
"download_url": "https://api.airaproof.com/api/v1/sanitize/file/a1b2c3d4.../download",
"tokenized_text": "Patient [REDACTED]\\nSSN: [REDACTED]\\n...",
"token_mapping": null,
"dicom_tag_actions": null,
"pixel_redactions": null,
"request_id": "req_xyz789"
}DICOM response (redact mode)
DICOM files include additional metadata about de-identification:
{
"file_type": "dicom",
"dicom_tag_actions": [
{ "tag_name": "PatientName", "tag_number": "(0010,0010)", "action": "removed", "original_hash": "sha256:..." },
{ "tag_name": "PatientID", "tag_number": "(0010,0020)", "action": "removed", "original_hash": "sha256:..." },
{ "tag_name": "PatientBirthDate", "tag_number": "(0010,0030)", "action": "removed", "original_hash": "sha256:..." }
],
"pixel_redactions": [
{ "text_found": "John Smith", "bounding_box": [100, 50, 250, 80], "confidence": 0.91 }
]
}Download sanitized file
Download the cleaned file using the one-time token from the sanitize response.
GET /api/v1/sanitize/file/{token}/downloadNo authentication required — the token itself is the authorization.
- Tokens expire after 1 hour
- Tokens are single-use — the file is deleted after the first download
- The response filename is
sanitized_<original_name>.<ext> - The
Content-Typeheader matches the original file type
Example
curl -o sanitized-record.pdf \
https://api.airaproof.com/api/v1/sanitize/file/a1b2c3d4.../downloadFindings object
Every sanitize response includes a findings array:
| Field | Type | Description |
|---|---|---|
entity_type | string | What was detected (e.g., us_ssn, ner_person, mrn_pattern) |
severity | string | critical, warning, or info |
action_taken | string | redacted, tokenized, blocked, flagged |
library | string | Which scanner found it (pii, credentials, pii_ner, healthcare) |
description | string | Human-readable description |
count | integer | How many instances of this entity type were found |
Severity behavior
| Severity | Redact/Tokenize | Block | Flag |
|---|---|---|---|
critical | Replaced | Blocked | Flagged |
warning | Replaced | Blocked | Flagged |
info | Not replaced (flagged only) | Blocked | Flagged |
Info-severity entities (IP addresses, URLs, organization names) are reported in findings but never redacted or tokenized. This prevents false positives like "HDL Cholesterol" being blacked out.
Error responses
| Status | Code | Description |
|---|---|---|
| 413 | — | File exceeds 50 MB |
| 415 | — | Unsupported file type |
| 422 | — | Empty file, corrupt file, or missing dependency |
| 422 | OUTPUT_SCAN_VIOLATION | Block mode triggered |