Email Threading Is Broken. And Your Agent Is Making It Worse.
Your agent replied to a customer. The customer sees two separate threads. Nobody knows which one to respond to.
You've built the hard part. The agent reads incoming email, understands intent, generates a reply. Impressive work. Then it sends that reply, and the customer's inbox shows it as a brand new conversation instead of a continuation of the existing one.
The customer is confused. They reply to the wrong thread. Now your agent has two conversations about the same issue with the same person. One has context. The other doesn't.
This happens more often than anyone admits. And the root cause is always the same: three email headers that most developers have never heard of until something breaks.
The three headers that control everything
Email threading is not a feature of email clients. It's a protocol convention built on three headers: Message-ID, In-Reply-To, and References.
Message-ID is a globally unique identifier assigned to every email when it's created. It looks something like <abc123@mail.example.com>. Every email you send needs one. Every email you receive has one.
In-Reply-To contains the Message-ID of the email you're responding to. When your agent replies to a customer's message, this header points back to that specific message.
References is the full chain. It contains every Message-ID from the conversation, in order. This is how email clients reconstruct the entire thread from a single message.
Get all three right, and the reply appears exactly where it should: nested under the message it's responding to, inside the existing conversation. Get any of them wrong, and you create a ghost thread.
Where developers get it wrong
The most common mistake is simple: not setting these headers at all. Your agent receives an inbound email, processes it, and sends a reply using a generic "send email" function. The outbound message gets a fresh Message-ID (good), but In-Reply-To is empty and References is missing. The email client on the other end has no way to connect this reply to the original conversation.
The second mistake is partial threading. You set In-Reply-To to the original message's ID, but you don't build the References chain. This works for simple back-and-forth conversations. It breaks the moment a thread has more than two or three messages, because some email clients use References (not just In-Reply-To) to determine thread membership.
The third mistake is the subtle one. You get the headers right, but you use a different From address or Subject line than the original conversation. Gmail groups by subject. Outlook groups by headers. Apple Mail does its own thing entirely. Change the subject, and some clients split the thread. Send from a different address than the one the customer originally contacted, and other clients get confused.
The fourth mistake is not storing Message-IDs at all. Your agent sends a reply, but you don't save the Message-ID that was generated. When the customer responds, you need to reference that ID in your next reply's References chain. If you didn't save it, you can't build the chain. The thread breaks on message three.
Why this is worse for automated systems
A human making a threading mistake creates one broken thread. An AI agent making a threading mistake creates hundreds.
Every conversation the agent handles gets the same bug. If your threading logic is wrong, it's wrong across every conversation. Every customer interaction produces a ghost thread. Every ghost thread creates confusion. Every confused customer is a support ticket that shouldn't exist.
And here's the part that makes it genuinely hard to debug: threading failures are invisible to the sender. Your agent sends the reply. The API returns success. Your logs show the message was delivered. Everything looks fine on your end.
The failure only manifests in the recipient's email client. And you have no way to see what their inbox looks like. You find out when customers start complaining, or worse, when they stop responding because they think your agent is sending them random disconnected emails.
What server-side threading actually solves
The fix isn't "be more careful with headers." The fix is to stop managing headers manually.
When mailbot handles threading server-side, the conversation model lives in the infrastructure, not in your application code. Here's what that means in practice.
When your agent sends a reply to a thread, you pass the thread ID. That's it.
import { MailBot } from '@yopiesuryadi/mailbot-sdk';
const mailbot = new MailBot({ apiKey: 'mb_your_api_key' });
await mailbot.messages.send({
inboxId: 'inbox_support',
threadId: 'thread_xyz789',
to: [{ email: 'customer@example.com' }],
subject: 'Re: Order #4521',
body: 'Hi Sarah, I found your order...'
});
mailbot already knows every message in that thread. It knows every Message-ID. It constructs the correct In-Reply-To and References headers automatically. It uses the same From address and maintains subject consistency.
When you need to read the conversation:
const thread = await mailbot.threads.get('thread_xyz789');
// thread.messages contains every message, in order, with full metadata
No header parsing. No Message-ID storage. No References chain construction. The thread is a first-class object in the API, not something you reconstruct from raw headers.
The edge cases that will ruin your week
Even if you handle the basic headers correctly, email threading has edge cases that will surprise you.
Forwarded messages. A customer forwards your agent's reply to a colleague. The colleague responds. Your agent receives a message from a new sender, with a References chain that points back to the original thread. Do you create a new thread or merge it into the existing one? mailbot handles this by maintaining the thread reference regardless of who sends the next message.
CC additions. A customer CCs someone midway through a conversation. Now there are multiple recipients. If any of them reply, the thread context needs to be maintained. If your agent replies, it needs to include the right recipients.
Client-side header mangling. Some email clients rewrite or truncate the References header. Some strip In-Reply-To entirely. Some add their own threading metadata that conflicts with the standard headers. When your threading logic depends on these headers being pristine, any client that modifies them breaks your system.
Subject changes. A customer replies but changes the subject line. Is this still the same thread? Gmail says no (it groups by subject). Most other clients say yes (they use headers). Your agent needs to handle both behaviors.
These aren't theoretical problems. They happen in every production email system. The question is whether your application code handles them or your infrastructure does.
Start with the thread, not the message
If there's one takeaway from this post, it's this: email threading is not a feature you add. It's a foundation you need from the start.
Every time your agent sends a reply, it's either maintaining a conversation or creating a ghost thread. There is no middle ground. And by the time you discover the threading is broken, your customers have already experienced the confusion.
Build on infrastructure that treats threads as first-class objects. Let your application focus on what to say, not on which headers to set.
mailbot is programmable email infrastructure for developers. Read the docs · Get API key