Skip to main content

2 posts tagged with "typescript"

View All Tags

Why Your AI Agent Loses Context After Three Email Replies

· 8 min read
Founder, mailbot

Your AI email agent handles the first two replies flawlessly. It reads the original message, drafts a thoughtful response, and sends it on time. Then reply three arrives and something quietly goes wrong. The agent responds to the wrong topic. It asks a question the customer already answered. It loses the thread entirely.

This is not a reasoning failure. The underlying language model did not forget anything. The problem is infrastructure: the email headers that stitch a conversation together are either missing, truncated, or misread by the receiving client. By the time a thread hits three or more messages deep, the structural glue that email relies on has often already snapped.

How Email Threads Are Actually Built

Every email message carries a unique identifier called a Message-ID. When you reply to that message, your client adds two headers: In-Reply-To, which holds the parent message's Message-ID, and References, which holds the entire ancestry chain.

As defined by RFC 2822 (IETF), the References header must equal the parent message's own References field plus the parent's Message-ID appended at the end. Each new reply extends this chain. So by message four, the References header contains three Message-ID values, each pointing one step further back.

This chain is how email clients group messages into a visible thread. Without it, each reply appears as a disconnected new conversation. Different email clients handle threading differently: Gmail uses References together with In-Reply-To and subject matching, Outlook leans primarily on References, and Apple Mail follows RFC 2822 most closely. Omitting the References header outright breaks threads after three or more messages in every major client.

Why the Break Happens at Reply Three (Not Reply One or Two)

The first reply only needs In-Reply-To. Many email clients and APIs get that right automatically. The second reply needs both In-Reply-To and a short References chain. Most still manage this correctly. But the third reply requires your sending code to correctly read and forward the entire References chain from the previous message, append the previous message's Message-ID, and then set the new In-Reply-To to point at that same previous message.

If any step in that chain is wrong, not just the most recent one, clients diverge. According to Alibaba LifeTips research on Outlook threading, Outlook splits 31.4% of threads that Gmail preserves intact. Outlook's threading is folder-bound and more sensitive to subject-line changes, while Gmail treats subject-matching as a fallback rather than a primary signal. A thread that looks continuous in Gmail may already be fragmented in Outlook before your agent sees it.

This is the compound problem. Your agent does not receive a clean, continuous conversation object. It receives fragments: some messages grouped, some orphaned, some duplicated across quoted footers. As one developer put it in a discussion on r/AI_Agents, roughly 80% of the tokens in a real email thread are duplicate quotes and footers, not new information. The same thread "evolves into a substantial infrastructure project" once you try to use it reliably in production.

The Token Waste and Context Collapse Problem

Before the headers even matter, there is a tokenization problem. When your agent fetches a thread naively, it is likely reading the same content four or five times because each reply quotes the previous one in full. The new information in message five may be two sentences. The payload you are feeding your agent may be four thousand tokens of redundant history.

This is a practical, measurable problem. The developer thread on r/AI_Agents describes direct measurements: roughly 80% of tokens in a real email thread are duplicate quotes, signatures, and footers. Your agent is spending most of its context window processing information it has already seen.

Then, when the References chain is broken, your agent cannot reliably determine which message is the true root of the conversation, which messages have already been handled, or what the correct In-Reply-To value for its outgoing reply should be. At that point, even a perfectly capable reasoning engine will produce incoherent replies because the input it received was incoherent. The agent did not fail. The input pipeline failed.

According to the Composio 2025 AI Agent Report, integration failure rather than language model failure is the primary cause of AI agent production failures. Email threading is a textbook example of this pattern. The model is fine. The plumbing is broken.

The Wrong Way: Manual Header Construction

Many developers, when building email reply logic, reach for client.messages.send() with manually constructed headers. It looks like this:

import { MailbotClient } from '@yopiesuryadi/mailbot-sdk';
const client = new MailbotClient({ apiKey: 'mb_test_xxx' });

// WRONG: manually constructing threading headers
const response = await client.messages.send({
inboxId: 'inbox_abc123',
to: [{ email: 'customer@example.com' }],
subject: 'Re: Your support request',
bodyText: 'Thank you for your message...',
headers: {
'In-Reply-To': '<original-message-id@mail.example.com>',
'References': '<original-message-id@mail.example.com>',
// You are now responsible for reading and appending the full References chain
// Get this wrong once and Outlook splits the thread. Forever.
},
});

The problem with this approach is not that it is hard to write. It is that it requires you to correctly read the full References chain from the previous message, append the new Message-ID, and keep this logic accurate across every environment that may format or truncate headers differently. Miss it once on reply three, and your agent is now operating on a broken thread for every reply that follows.

The Right Way: client.messages.reply()

The client.messages.reply() method exists specifically to handle this. It reads the correct Message-ID from the parent message, builds the full References chain by reading the parent's own References header, and sets both In-Reply-To and References correctly before sending. You do not touch headers at all.

import { MailbotClient } from '@yopiesuryadi/mailbot-sdk';
const client = new MailbotClient({ apiKey: 'mb_test_xxx' });

// RIGHT: let the SDK handle threading headers automatically
const reply = await client.messages.reply({
inboxId: 'inbox_abc123',
messageId: 'msg_xyz789', // the specific message you are replying to
bodyText: 'Thank you for following up. Here is what we found...',
});

That single call handles RFC 2822 compliance, correct References chain construction, and In-Reply-To assignment. No manual header management, no risk of chain truncation.

Fetching Full Thread Context Before You Reply

The other half of the problem is making sure your agent actually reads the full thread before composing a reply. The correct pattern is to call client.threads.get() first, which returns the complete message history for the thread as a structured object, then pass that context to your agent, and only then call client.messages.reply().

import { MailbotClient } from '@yopiesuryadi/mailbot-sdk';
const client = new MailbotClient({ apiKey: 'mb_test_xxx' });

// Step 1: Get the full thread so your agent has complete context
const thread = await client.threads.get(inboxId, threadId);

// Step 2: Extract message bodies in order (skip duplicate quoted sections)
const messages = thread.messages.map((msg) => ({
from: msg.from,
date: msg.date,
body: msg.bodyText,
}));

// Step 3: Feed structured thread to your agent, get a reply draft
const replyDraft = await yourAgentLogic(messages);

// Step 4: Reply using the SDK so headers are handled correctly
const sent = await client.messages.reply({
inboxId,
messageId: thread.messages.at(-1)?.id, // reply to the most recent message
bodyText: replyDraft,
});

This pattern gives your agent structured, deduplicated context rather than a raw chain of quoted bodies. It also ensures the reply is attached to the correct message in the thread, which is what determines whether Gmail, Outlook, and Apple Mail all show it in the right place. The deduplication step in particular matters: by passing only the unique message bodies in chronological order instead of four thousand tokens of repeated quoted text, your reasoning engine operates on clean input. Your agent processes less and produces more accurate output.

What Consistent Threading Actually Unlocks

When your agent maintains thread continuity reliably, several things improve at once. The conversation history your agent reads is accurate instead of fragmented. Reply rates from customers tend to rise because responses feel contextually aware rather than generic. Thread-level event history through client.events.list(threadId) becomes useful for auditing what the agent did and when.

More importantly, you stop debugging ghost threads in Outlook and wondering why a customer says "I already answered that." This much is clear: the clients that handle threading well do so because they construct headers by the spec. The ones that break threads do so because they take shortcuts. Your agent should not take shortcuts either.

Email is still the dominant communication channel for business workflows. An AI agent that loses thread context after three replies is not a production agent. It is a prototype that creates cleanup work for your team. Getting the infrastructure right is not optional, and as the r/AI_Agents community noted, most teams learn this the hard way after shipping.

You do not have to.

Start building thread-aware email agents at getmail.bot/docs/getting-started.


Sources

  1. RFC 2822: Internet Message Format (IETF)
  2. Reddit r/AI_Agents: Email Context for AI Agents Is Way Harder Than It Looks
  3. Composio: Why AI Agent Pilots Fail, 2026 Integration Roadmap
  4. Alibaba LifeTips: Make Outlook Thread Conversations Like Gmail

Building an AI Support Agent That Sends Real Email (Not Just Chat)

· 9 min read
Founder, mailbot

The Problem Is Not the AI

Most teams building AI support agents hit the same wall. The AI classification works fine in testing. The prompt responses look reasonable. But when they try to connect it to actual email, things fall apart fast. The inbox is shared with marketing sends. There is no way to listen for inbound messages without polling. Replies break the thread. Nobody knows whether the automated response was actually delivered.

As one developer put it in a thread on r/AI_Agents: "What begins as simple email context evolves into a substantial infrastructure project." That quote describes the experience of most teams within the first week of building a real support agent, not a demo.

The Composio AI Agent Report is direct about the root cause: integration failure, not model failure, is the number one reason AI agent pilots fail in production. The report identifies "brittle connectors" as a specific trap, where one-off integrations work in isolation but break the moment real email volume hits, or when email clients format messages differently than expected.

This post is a comprehensive walkthrough for building a support agent that avoids those failure modes. It covers everything from creating a dedicated inbox, to listening for inbound messages, to classifying intent, to confirming delivery, to escalating uncertain cases to a human reviewer. If you want the 30-minute quickstart version, the existing Build an Email AI Agent in 30 Minutes post covers the basics. This post is for teams who want something production-ready.

Why Dedicated Infrastructure Matters

A support agent needs its own inbox, its own event notification listener, and a reliable threading model. Sharing an inbox with other email processes introduces noise that defeats classification before the AI ever sees a message.

Instantly's email triage research found that 70 to 80 percent of routine support emails can be classified and responded to automatically, but only when the classification system has clean, well-scoped input. Routing all company email through one inbox and asking an agent to sort it out is not a clean input.

It is worth noting that we run mailbot's own support inbox this way. The architecture described in this post is not hypothetical. You can read about it in the mailbot dogfooding post, which covers how we use our own API to handle support at the company level.

Step 1: Create a Dedicated Inbox

Start by initializing the SDK and creating an inbox specifically for support:

import { MailbotClient } from '@yopiesuryadi/mailbot-sdk';

const client = new MailbotClient({ apiKey: 'mb_test_xxx' });

const inbox = await client.inboxes.create({ name: 'support-agent' });
console.log('Inbox created:', inbox.id, inbox.address);

This gives you an isolated address (something like support-agent@yourdomain.getmail.bot) that receives only inbound support email. No newsletter noise, no transactional sends from other systems. Your classifier gets a clean channel.

Step 2: Register an Event Notification Listener

Polling an inbox on an interval is the third failure trap identified in the Composio report, labeled the "Polling Tax." It wastes resources, introduces latency, and adds another surface where things can fail silently.

Register an event notification endpoint instead. The SDK makes this a single call:

const hook = await client.webhooks.create({
url: 'https://your-agent.example.com/inbound',
events: ['message.inbound'],
});
// Note: Webhooks fire for all inboxes. Filter by inboxId in your /inbound handler.
console.log('Listener registered:', hook.id);

Your endpoint at /inbound will now receive a POST payload every time a new message arrives in the support inbox. No polling required.

Step 3: Receive and Read the Inbound Message

When your endpoint receives a notification, it includes the inboxId and messageId. Use those to fetch the full message and the thread context:

app.post('/inbound', async (req, res) => {
const { inboxId, messageId, threadId } = req.body;

// Fetch the individual message
const message = await client.messages.get(inboxId, messageId);

// Fetch the full thread for context
const thread = await client.threads.get(inboxId, threadId);

// Pass to your classifier
const intent = await classifyIntent(message.subject, message.bodyText, thread);

await handleIntent(intent, inboxId, messageId);

res.sendStatus(200);
});

Fetching the full thread via client.threads.get() is important for repeat customers or ongoing issues. A support ticket about a billing error in the third reply looks very different without the first two messages. Thread context prevents your classifier from treating it as a fresh, unrelated inquiry.

Step 4: Classify Intent and Reply

Your AI classifier receives the message text and thread context and returns an intent label plus a confidence score. The exact implementation of your classifier is up to you. The important part is that this function returns something structured:

async function classifyIntent(subject: string, body: string, thread: any) {
// Call your AI classification layer here
// Return: { intent: string, confidence: number, suggestedReply: string }
}

Instantly's research shows that 70 to 80 percent of routine support emails fall into a small set of intent categories: order status, refund request, account access, and general inquiry. A well-tuned classifier handles the bulk of volume without human review.

When confidence is above your threshold, reply in the same thread:

async function handleIntent(intent: any, inboxId: string, messageId: string) {
if (intent.confidence >= 0.80) {
await client.messages.reply({
inboxId,
messageId,
bodyText: intent.suggestedReply,
});
} else {
await escalateToHuman(inboxId, messageId, intent);
}
}

Using client.messages.reply() keeps the response inside the original thread. The customer's email client shows it as a continuation of the same conversation, not a new message. This matters both for the customer experience and for the threading chain that future AI classification will need.

Step 5: Verify Delivery with the Event Timeline

Sending a reply is not the same as delivering it. Network issues, misconfigured DNS, and provider-side throttling can all cause a message to leave your system without reaching the recipient.

Use client.engagement.messageTimeline() to confirm the delivery path after sending:

const timeline = await client.engagement.messageTimeline(messageId);

const delivered = timeline.events.some(e => e.type === 'delivered');
const opened = timeline.events.some(e => e.type === 'opened');

if (!delivered) {
console.warn('Reply not confirmed delivered. Flagging for review.');
// Trigger retry or alert here
}

This is the kind of operational check that separates a demo agent from a production one. If a customer does not receive the reply, the next message they send will be an escalation in frustration. Catching delivery failures early gives you time to intervene before that happens.

Step 6: Escalate to a Human When Confidence Is Low

When the classifier's confidence falls below your threshold, the message should go to a human reviewer rather than being sent an automated reply that may be wrong or tone-deaf.

The pattern has two parts: label the message so it appears in the escalation queue, then notify a human agent via a separate inbox.

async function escalateToHuman(inboxId: string, messageId: string, intent: any) {
// Label the message in the support inbox
await client.messages.updateLabels({
inboxId,
messageId,
labels: ['escalated'],
});

// Send notification to human agent inbox
await client.messages.send({
inboxId: HUMAN_AGENT_INBOX_ID,
to: 'support-team@yourcompany.com',
subject: 'Escalation Required: Low Confidence Classification',
bodyText: `Message ID ${messageId} was classified as "${intent.intent}" with confidence ${intent.confidence}. Please review and respond manually.`,
});
}

This pattern is consistent with findings from Eesel AI's analysis of human handoff best practices, which identifies confidence thresholds and intent-specific triggers as the most reliable escalation signals. Keywords like "refund," "cancel," or "legal" warrant a lower threshold regardless of overall confidence.

The label approach keeps your support inbox organized. Messages labeled escalated appear separately from those the agent handled autonomously. You get a natural audit trail without building a separate database.

Step 7: Check Compliance Readiness Before Going Live

Before routing real customer email through the agent, run a compliance readiness check on the inbox:

const readiness = await client.compliance.readiness(inbox.id);
console.log('Compliance status:', readiness);

This checks that the inbox has proper configuration for unsubscribe handling, opt-out tracking, and other requirements that apply to automated email senders. Running this before go-live avoids situations where a compliance gap surfaces only after you have been sending at volume.

Putting It Together

The full architecture looks like this:

  1. A dedicated support inbox receives inbound email cleanly.
  2. An event notification listener fires your handler on each new message.
  3. Your handler fetches the message and full thread context.
  4. Your AI classifier returns an intent and confidence score.
  5. High-confidence intents trigger an automated reply via client.messages.reply().
  6. The event timeline confirms delivery after each send.
  7. Low-confidence intents are labeled escalated and routed to a human agent via a second inbox.
  8. Compliance readiness is verified before production launch.

We built and run this exact pattern for mailbot's own support. The dogfooding post goes into detail on how the live system handles real volume and where we had to adjust our confidence thresholds over time.

The Infrastructure Is the Product

The AI classifier is the part that gets the most attention in conversations about AI support agents. But as the r/AI_Agents community has found directly, the classifier is rarely where things break. The email infrastructure underneath it is where fragility lives: brittle polling loops, lost thread context, unconfirmed delivery, no human fallback.

The steps in this guide address each of those failure points specifically. A dedicated inbox eliminates noise. Event notifications replace polling. client.threads.get() preserves context. client.engagement.messageTimeline() confirms delivery. Labels and a second inbox create a human escalation path. Compliance readiness checks prevent surprises at go-live.

Ready to start building? The full SDK reference is at getmail.bot/docs/getting-started.


Sources

  1. r/AI_Agents: Email context for AI agents is way harder than it looks
  2. Composio: Why AI Agent Pilots Fail in 2026 (Integration Roadmap)
  3. Instantly: Automate Email Triage Classification with AI
  4. Eesel AI: Best Practices for Human Handoff in Chat Support
  5. mailbot: We Run Our Own Support on Our Own API