Skip to main content

How Inbound Email Processing Actually Works

· 6 min read
Co-founder, mailbot

Your agent can send beautiful emails. But the moment someone replies, you're dealing with MIME parsing, header extraction, bounce detection, and content types you didn't know existed. Here's what happens between the SMTP handshake and your application code.

Most tutorials about email for AI agents focus on sending. Compose a message, call an API, done. The outbound path is well understood. Inbound is where things get interesting. And by interesting, I mean painful.

Receiving email programmatically means your system becomes a mail server. Even if you're using an abstraction layer, understanding what happens underneath will save you from bugs that only show up in production.

The journey of an inbound email

When someone sends an email to your agent's address, a sequence of events fires that most application developers never think about.

Step 1: DNS resolution. The sender's mail server performs an MX lookup on your domain. That MX record points to the server responsible for accepting email on behalf of that domain. If the MX record is missing or misconfigured, the email bounces before it reaches your infrastructure.

Step 2: SMTP connection. The sender's server opens a TCP connection to your mail server on port 25. What follows is an SMTP conversation: EHLO, MAIL FROM, RCPT TO, DATA. Each command can succeed or fail independently. Your server can reject the connection, the sender address, or the recipient address at any point.

Most inbound email failures happen during this handshake, not after. A misconfigured TLS certificate, an unrecognized recipient, or a full disk on the receiving server will kill the message before your application code ever sees it.

Step 3: Message transfer. The email content arrives as a stream of bytes following the DATA command. This is where MIME enters the picture. The message is not a simple text blob. It's a structured document with headers, a body in multiple formats, and potentially dozens of attachments.

Step 4: Processing and delivery. Once the message is accepted and parsed, your system needs to route it to the right inbox, match it to an existing conversation thread, extract content, and trigger downstream logic.

MIME parsing is harder than you think

A single email can contain nested multipart structures that look something like this:

multipart/mixed
multipart/alternative
text/plain
multipart/related
text/html
image/png (inline)
application/pdf (attachment)
message/rfc822 (forwarded email)

That last one is the killer. A forwarded email is a MIME part containing another complete email, with its own headers, body parts, and attachments. Your parser needs to handle arbitrary nesting depth.

Most developers expect a flat structure. Subject, body, attachments. Real world email is a tree. And the tree can be malformed. Email clients are inconsistent about how they construct MIME structures. Some embed inline images as multipart/related. Others attach them as multipart/mixed. Some encode text as quoted-printable. Others use base64. Some use UTF-8 consistently. Others mix character encodings within the same message.

If your AI agent needs to understand inbound email, your MIME parser is the foundation. Get it wrong, and your agent reads garbled text, misses attachments, or misinterprets forwarded messages as original content.

Headers tell you more than the body

The email body gets all the attention, but headers carry the operational metadata your system needs.

Message-ID is the unique identifier for this email. Use it for deduplication. Same Message-ID twice? Already processed.

In-Reply-To contains the Message-ID of the email this message responds to. This is threading at the protocol level.

References is a space-separated list of Message-IDs representing the full thread ancestry. In theory. In practice, email clients truncate, modify, or omit this chain entirely.

Content-Type defines the MIME structure and carries parameters like charset and boundary that are critical for correct parsing. A missing boundary parameter will cause your entire MIME parser to fail.

Received headers (usually multiple) trace the path the email took through the internet. Reading them in order gives you a timestamp trail for debugging delivery delays.

Here's how mailbot surfaces these through the SDK:

import { Mailbot } from '@yopiesuryadi/mailbot-sdk';
const mailbot = new Mailbot({ apiKey: 'mb_your_api_key' });

const inbox = await mailbot.inboxes.get('inbox_abc123');
const messages = await inbox.messages.list({ limit: 10 });

for (const msg of messages) {
console.log('From:', msg.from);
console.log('Subject:', msg.subject);
console.log('Thread ID:', msg.threadId);
console.log('In-Reply-To:', msg.headers['in-reply-to']);
console.log('Attachments:', msg.attachments.length);
}

The content extraction problem

Getting message content into a format your AI agent can work with is less straightforward than it sounds.

The HTML version of an email is not the plain text version with formatting. They can contain completely different content. Marketing emails often have HTML-only layouts where the plain text fallback is empty or auto-generated gibberish. Even in conversational email, signature blocks and quoted reply chains differ between the two versions.

For AI agents, plain text is cleaner and cheaper in tokens, but loses structural information like tables and links. HTML preserves structure but includes massive noise: styling, tracking pixels, wrapper divs. The practical answer is to use plain text as the primary input, fall back to sanitized HTML-to-text conversion when plain text is absent, and make attachments available as separate references the agent can inspect when needed.

Bounce handling is its own protocol

When an email your agent sent cannot be delivered, the receiving server generates a bounce notification. This notification is itself an email, sent to the envelope sender address (which may differ from the visible From header).

Hard bounces mean the address is permanently invalid. Soft bounces mean delivery failed temporarily. The problem is that bounce formats are loosely standardized. RFC 3464 defines Delivery Status Notifications as a structured MIME format, but many servers send free-form text instead. Reliable bounce processing requires parsing structured DSNs when available and pattern matching on common formats when not.

mailbot handles this automatically:

from mailbot import Mailbot

mailbot = Mailbot(api_key="mb_your_api_key")

@mailbot.on("message.bounced")
def handle_bounce(event):
print(f"Bounced: {event.original_message_id}")
print(f"Type: {event.bounce_type}") # "hard" or "soft"
print(f"Recipient: {event.recipient}")

Why this matters for AI agents

AI agents that process inbound email are only as reliable as their email infrastructure. If the MIME parser chokes on a forwarded message, the agent never sees the content. If bounce handling is missing, the agent keeps sending to dead addresses. If header parsing is incomplete, threading breaks and the agent loses conversation context.

Building this from scratch means maintaining an SMTP server, a MIME parser, a threading engine, a bounce processor, and an event notification system. That's a full engineering team's worth of work before your agent writes a single response.

The alternative is to let the infrastructure handle the protocol complexity and give your agent structured, parsed, threaded messages through an API. That's what mailbot is built to do. Your agent focuses on understanding and responding. The email plumbing stays invisible.