Universal Extraction

AgentInbox's universal extraction system automatically extracts structured data from any email using a cascading approach: built-in extractors first, then LLM fallback.

Extraction Pipeline

The system tries multiple strategies in order: pattern matching, heuristics, and LLM fallback. Each stage is faster than the next.

How It Works

When you request an extraction, the system tries multiple strategies in order:

1

Built-in Extractors

Fast, deterministic regex and pattern-based extractors for common types like OTP, magic links, and tracking numbers.

2

Heuristic Analysis

If the built-in extractor doesn't find a match, heuristic analysis examines the email structure and context.

3

LLM Fallback

If the first two stages fail, an LLM (GPT-4o-mini) reads the email and extracts the requested data. This is optional and can be disabled.

Built-in Extractors

Built-in extractors are optimized for specific email types and run instantly with no additional cost.

OTP

Matches 4-8 digit codes in verification contexts. Handles spaces, dashes, and prefixes.

Magic Link

Extracts the primary verification URL from HTML and text bodies.

Verification Code

Alphanumeric codes, often longer than OTPs, with mixed case support.

Invoice Number

Matches common invoice formats (INV-, #12345, etc.) from billing emails.

Tracking Number

Recognizes carrier formats (UPS, FedEx, USPS, DHL) and generic tracking codes.

API Token

Extracts bearer tokens, API keys, and secrets from developer emails.

Coupon Code

Identifies promo codes and discount identifiers from marketing emails.

Password Reset Link

Extracts password reset URLs from account recovery emails.

Confidence Scoring

Every extraction returns a confidence score. Built-in extractors typically return scores above 0.95.

  • Built-in: 0.95 - 1.00 (very high confidence)
  • Heuristic: 0.70 - 0.95 (good confidence)
  • LLM: 0.80 - 0.95 (high confidence, but slower)

Disabling LLM Fallback

If you want only fast, deterministic extraction, disable the LLM fallback.

typescript
const extraction = await client.extractions.create({
messageId: "msg_456",
type: "otp",
useLlm: false, // Disable LLM fallback
});

Performance optimization

Disabling LLM fallback ensures sub-100ms extraction times. Use this when you know the email contains standard patterns.

Related Documentation