Scan API

The Scan API runs STACK's prompt-injection detector against arbitrary content you supply. Use it on retrieved data — emails, documents, web pages, calendar invites, API responses, anything an agent might pull from a non-user source — BEFORE feeding the content into your LLM. Returns a verdict so your agent can refuse to proceed if the content carries an injection payload.

Most production prompt injections are indirect — the user is benign, the agent retrieves a doc/email, and the doc/email contains the injection.POST /v1/scan exists specifically for this attack class. Available on every tier within the monthly scan allowance.

Make a Scan Request

Send the content to scan. The detector runs the same three-layer chain that fires on/v1/proxy and /v1/skills/:id/invoke:

  • L1 - regex pattern catalog (instruction overrides, role-play breakouts, system-prompt extraction, jailbreak names, safety bypass)
  • L2 - encoding-aware normalization that decodes base64, URL, hex, leetspeak, unicode homoglyphs, zero-width characters, ROT13, separator obfuscation, and reversed text before re-running L1 against each variant
  • L3 - LLM-classifier funnel (Haiku 4.5 via OpenRouter) for semantic-novel attacks the regex layer cannot see: paraphrased overrides, authority impersonation, polite-format wrappers, indirect injection in retrieved content, multilingual variants. Heuristic gate fires L3 only on body length ≥ 60 chars and non-critical L1+L2 verdict; 3-second timeout with graceful degrade-to-L1+L2 on failure
bash
curl -X POST https://api.getstack.run/v1/scan \
  -H "Authorization: Bearer sk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "<the email body, document text, or scraped webpage>",
    "context": "email",
    "source": "sender@external.com"
  }'

Request Fields

  • content (string | object, required) - the content to scan. Strings scanned directly; objects walked one level deep, same as proxy bodies.
  • context (enum, optional) - one of "email" | "document" | "webpage" | "chat_log" | "api_response" | "calendar" | "other". Recorded in security event metadata.
  • source (string, optional) - source identifier (URL / sender / filename). Truncated to 256 chars in event metadata.

Optional Headers

  • X-Passport-Token - if present, the scan is attributed to the passport JWT (agent_id + passport_jti recorded on any security event).

Response

json
{
  "verdict": "critical",
  "scan_id": "ses_abc123def456",
  "duration_ms": 2120,
  "match": {
    "pattern_id": "override_ignore_previous",
    "severity": "critical",
    "field_path": "content",
    "matched_excerpt": "ignore all previous instructions",
    "encoding": "base64_decode"
  },
  "llm": {
    "verdict": "critical",
    "confidence": 92,
    "reasoning": "Authority impersonation attempt with embedded directive...",
    "model": "anthropic/claude-haiku-4.5"
  },
  "l3_degraded": null
}
  • verdict ("clean" | "suspicious" | "critical") - clean = no L1/L2/L3 hit, suspicious = warning-severity, critical = critical-severity
  • scan_id - links to the security event row when verdict is non-clean
  • duration_ms - end-to-end scan time. L1+L2 alone is 1-10ms; full L1+L2+L3 chain is 1-3s when the LLM gate fires
  • match - null when no L1/L2 pattern fired. On a hit: pattern_id, severity, field path, truncated matched text (lowercased), and encoding if L2 normalization revealed the match
  • llm - null when L3 was skipped (body too short, L1/L2 already critical, no OpenRouter credential), or when L3 ran but degraded. On a successful run: verdict, confidence (0-100), one-sentence reasoning, model identifier
  • l3_degraded - null when L3 ran successfully OR was skipped. Set to { reason, duration_ms } when L3 ran but failed (timeout, network, parse error). Callers can lift trust accordingly

Encoding Attribution

When L2 normalization reveals a match the raw text didn't expose, the response carries an encoding field on the match. Possible values:

  • base64_decode - found in a base64-encoded substring
  • url_decode - %XX-encoded payload
  • hex_decode - hex-encoded payload
  • leetspeak - digit/symbol substitution
  • homoglyph - Cyrillic / Greek / mathematical-bold lookalikes
  • zero_width_strip / zero_width_as_space - invisible char obfuscation
  • separator_collapse - i.g.n.o.r.e style
  • spaced_letters - i g n o r e style
  • identifier_casing - camelCase / snake_case / kebab-case
  • rot13 - ROT13-rotated text
  • reverse - character-reversed text

Recommended Agent Flow

javascript
// Pseudo-code: scan retrieved content before feeding to the LLM
async function summarizeEmail(emailBody, sender) {
  const result = await stack.scan.scan({
    content: emailBody,
    context: 'email',
    source: sender,
  }, { passportToken: passport });

  if (result.verdict === 'critical') {
    // Refuse — the email contains injection. Surface to the user.
    return { error: 'Email body contains a prompt-injection payload; refusing to summarize.', match: result.match };
  }
  if (result.verdict === 'suspicious') {
    // Warn but proceed with extra caution. Optionally lower trust.
    log.warn('scan suspicious', result.match);
  }
  return await llm.summarize(emailBody);
}

Rate Limiting + Pricing

Scan calls are metered per operator on a monthly basis. Allowances:

  • Free - 10,000 scans / month
  • Developer - 50,000 scans / month
  • Studio - 500,000 scans / month
  • Enterprise - unlimited

Above the monthly allowance, calls are debited from the operator's wallet at $0.0001 per scan (cheaper than proxy because no upstream forward). If the wallet balance is insufficient, the call returns 402 Payment Required.

Check Scan Usage

bash
curl https://api.getstack.run/v1/scan/usage \
  -H "Authorization: Bearer sk_live_your_key"
json
{
  "tier": "developer",
  "limit": 50000,
  "used": 1523,
  "remaining": 48477,
  "period": "2026-04"
}

Security Events

Every match (warning or critical) records a prompt_injectionsecurity event with:

  • pattern_id - which L1 pattern caught it
  • field_path - where in the content
  • matched_excerpt - first ~80 chars of the trigger, lowercased (PII protection)
  • encoding - which L2 normalization revealed it (if any)
  • scan_context - the context tag you supplied
  • scan_source - the source identifier you supplied (truncated to 256 chars)

The event is severity-tagged exactly the same as proxy/skill scans: warning for soft phrasing, critical for strong instruction-override or jailbreak-name matches.

Honest Scope Statement

Benchmark numbers on a 1087-sample corpus (deepset/prompt-injections + STACK curated supplement + AgentDojo extracted): full L1+L2+L3 chain achieves F1 0.86, precision 0.98, recall 0.77. L1 alone reaches F1 0.43; L1+L2 reaches 0.49. L3 is the largest single contributor.

The detector is strong but not infallible. Bear in mind: (1) L3 sees the content you scan — it leaves STACK's infrastructure to reach the OpenRouter-hosted Haiku model. For especially-sensitive data, consider scanning only with L1+L2 (future per-detector enforcement controls will expose this). (2) Adversarial attacks targeted specifically at LLM classifiers (e.g. carefully-crafted confusing-the-classifier prompts) may evade L3. (3) Indirect attacks that don't carry any directive surface in the content (e.g. data poisoning that only manifests later) are out of scope for input-side scanning. Use the scan verdict as one strong signal in your overall trust model, not as a complete guarantee.

stack | docs