Skip to main content

2 posts tagged with "developer-tools"

View All Tags

When Your AI Agent Should Stop Sending Email and Ask a Human

· 9 min read
Founder, mailbot

The Agent That Kept Apologizing

Imagine an AI agent handling your customer support inbox. A customer writes in, frustrated, mentioning a potential refund dispute. The agent replies with a calm, professional response. The customer replies again, angrier. The agent replies again, still composed. By the fourth exchange, the agent has sent four apology emails to a customer who needed a human to make a judgment call two emails ago.

This is not a hallucination problem. The agent understood the situation. It just had no mechanism to know when it was no longer the right tool for the job.

The Problem With Autonomous Email Agents

Autonomous agents handle routine tasks well. They can parse inbound emails, look up order status, send confirmations, and follow up on open threads. But real inboxes are not neat. They contain sensitive topics, emotionally loaded language, ambiguous requests, and situations where the wrong reply carries legal or reputational risk.

The established playbook for handling this is called human-in-the-loop (HITL), and most of the literature around it focuses on chat. Chat handoff is well-understood: a bot loses confidence, a session is live, a human joins the conversation. The handoff is synchronous. Both parties are present.

Email handoff is a different problem. There is no live session. The customer sent their message and walked away. The agent's reply may sit in their inbox for hours. If the agent escalates incorrectly and a human also replies, you now have two conflicting responses in the same thread. And if the escalation is not properly tracked, the human operator may not even know they need to act.

Nobody in the HITL space talks about this. That gap is exactly what this post addresses.

The Insight: Email Handoff Requires Async-Safe Traceability

In chat, a handoff is an event: a session is transferred, a new agent joins, the conversation continues. In email, a handoff is a state change on a thread. The thread must be marked. The human operator must be notified through a separate channel. The agent must stop sending until the human resolves or re-delegates.

This requires three things to work correctly:

  1. Trigger logic that recognizes when escalation is warranted
  2. Notification routing that alerts a human without polluting the customer thread
  3. Thread state management that prevents the agent from continuing to reply

Get any one of these wrong and you get either missed escalations, duplicate replies, or a human who does not realize they are on the hook.

When to Escalate: The Triggers That Matter

Not every uncertain situation warrants a handoff. According to Elementum AI, a reasonable target is a 10 to 15 percent escalation rate. Too low, and your agent is overconfident. Too high, and human operators are overwhelmed and the system defeats itself.

The triggers worth implementing fall into three categories.

Confidence threshold breach. When the agent's confidence score for its intended reply drops below a defined threshold, it should not send. Anyreach sets this threshold at 85 percent. Below that, human intervention is triggered. Their reported result is 99.8 percent accuracy with HITL active, compared to lower accuracy without it.

Keyword and topic detection. Certain words in an inbound message should immediately flag for review regardless of confidence score. Eesel AI identifies the most common triggers in support contexts: refund, cancel, legal, complaint, and explicit requests to speak with a human. In email, this detection runs on the inbound message body before the agent drafts a reply.

Loop and failure detection. When the same thread has cycled through multiple agent replies without resolution, the agent is probably stuck. Replicant identifies conversation loops, repeated fallback responses, and backend failures as AI-initiated escalation triggers. In email, a loop looks like an increasing reply count on a thread with no status change. Practitioners building agent systems also tie escalation to tool failure events and low evaluation scores, not just confidence on the reply itself.

How Event Notifications Become Escalation Triggers

Every email thread carries an event timeline: message received, agent replied, customer opened, customer replied again, bounce detected. These events are the raw material for escalation logic.

The right architecture treats event notifications as the nervous system of the escalation pipeline. Instead of polling for thread state on a schedule, the agent registers a listener for specific events and acts when those events arrive. A bounce on a reply, a sentiment shift in a new inbound message, or a third reply from the same sender within 24 hours can each serve as a trigger signal.

Here is how to wire that up with the mailbot SDK:

import { MailbotClient } from '@yopiesuryadi/mailbot-sdk';
const client = new MailbotClient({ apiKey: 'mb_test_xxx' });

// Register an event notification listener for inbound messages
// Note: Webhooks fire for all inboxes. Filter by inbox in your handler if needed.
await client.webhooks.create({
url: 'https://your-agent.example.com/hooks/inbound',
events: ['message.received', 'message.bounced']
});

When the event arrives at your handler, you check the thread timeline to assess the escalation signal:

// In your event handler
async function handleInbound(payload: { threadId: string; messageId: string }) {
const events = await client.events.list(payload.threadId);
const replyCount = events.filter(e => e.type === 'message.sent').length;
const hasBounce = events.some(e => e.type === 'message.bounced');

if (replyCount >= 3 || hasBounce) {
await escalateToHuman(payload.threadId, payload.messageId);
}
}

Building the Async-Safe Handoff

Once the escalation decision is made, you need to do three things in sequence. Mark the thread, notify the human, and stop the agent.

Step 1: Mark the thread as escalated.

async function escalateToHuman(threadId: string, messageId: string) {
// Mark the message so the agent pipeline knows to skip this thread
await client.messages.updateLabels(messageId, {
add: ['escalated', 'awaiting-human'],
remove: ['agent-active']
});

Step 2: Notify the human operator through a separate inbox.

The escalation notice goes to your internal operator inbox, not the customer thread. This is critical. A reply to the customer thread at this point would be a second response the customer was not expecting, and could conflict with the human's eventual reply.

  // Notify the human operator via a separate internal inbox
await client.messages.send({
inboxId: 'inbox_operator_alerts',
to: 'support-lead@yourcompany.com',
subject: `[Escalation Required] Thread ${threadId}`,
body: `A customer thread requires human review.\n\nThread ID: ${threadId}\nMessage ID: ${messageId}\n\nReason: Reply loop detected or bounce received.\n\nReview and reply directly to the customer thread.`
});

Step 3: Confirm delivery of the escalation notice.

Before the function exits, confirm the escalation message actually reached the operator. A failed escalation notification is as bad as no escalation at all.

  // Verify the escalation notice was delivered
const timeline = await client.engagement.messageTimeline(messageId);
const delivered = timeline.events.some(e => e.type === 'delivered');

if (!delivered) {
// Log for retry or fallback alerting
console.error(`Escalation notification not delivered for thread ${threadId}`);
}
}

Your agent's main reply loop must check for the escalated label before drafting any response. If the label is present, the agent skips that thread entirely until a human resolves and removes the label.

Why This Architecture Matters

The difference between a good HITL system and a bad one in email contexts is not the trigger logic. Teams spend most of their time on that. The real failure mode is what happens after the decision is made.

In chat, the session transfer is enforced by the platform. The agent is literally removed from the conversation. In email, you must enforce that boundary yourself. The agent will keep replying if you let it. The escalated label combined with a label check at the start of the reply pipeline creates the boundary. Without it, the escalation remains a notification rather than a state change, and the agent keeps going.

Elementum AI frames HITL as a continuous feedback loop rather than a one-time gate. That framing applies here: after the human resolves the thread, removing the escalated label re-enables the agent on future inbound messages. The thread history becomes part of the agent's training signal. Each escalation is also a data point on where your confidence thresholds need adjustment.

The Broader Pattern

Email handoff is harder than chat handoff because it forces you to treat escalation as a durable state, not a transient event. The thread exists in perpetuity. The customer will reply again. The agent will see that reply. If your system treats escalation as a notification and not a state change, the agent will respond to that next reply as if the escalation never happened.

The architecture described here: event-triggered listeners, timeline-based loop detection, label-enforced agent gating, and human notification through a separate channel, is the pattern that makes email HITL actually work. Not the detection logic. The state management.

If you are building an email agent and your HITL plan is to log escalations to a Slack channel and hope someone notices, you are one busy support queue away from a problem.


Build your first escalation pipeline on mailbot.


Sources

  1. Elementum AI, "Human-in-the-Loop Agentic AI" (2026-03-12): https://www.elementum.ai/blog/human-in-the-loop-agentic-ai
  2. Eesel AI, "Best Practices for Human Handoff in Chat Support" (2025-10-22): https://www.eesel.ai/blog/best-practices-for-human-handoff-in-chat-support
  3. Replicant, "When to Hand Off to a Human: How to Set Effective AI Escalation Rules" (2025-06-23): https://www.replicant.com/blog/when-to-hand-off-to-a-human-how-to-set-effective-ai-escalation-rules
  4. Reddit r/AI_Agents, "Anyone building agent systems with human-in-the-loop?": https://www.reddit.com/r/AI_Agents/comments/1m5q6h1/anyone_building_agent_systems_with_humanintheloop/
  5. Anyreach, "What Is Human-in-the-Loop in Agentic AI: Building Trust Through Intelligent Fallback" (2025-08-04): https://blog.anyreach.ai/what-is-human-in-the-loop-in-agentic-ai-building-trust-through-intelligent-fallback/

How to Give Your AI Agent a Real Email Inbox with MCP

· 7 min read
Founder, mailbot

Most email MCP servers let your AI client send email. That is the easy half. The harder half is letting it receive replies, track delivery events, and maintain conversation context across a thread. This tutorial shows you how to wire both halves together using the mailbot MCP server.

If you have searched for "MCP email server" or "email MCP server" and landed on tutorials that only cover outbound, you already know the gap. MailerCheck's roundup of 6 email MCP servers confirms that the only two-way option in the list is a Gmail relay through Zapier. For developers who want a purpose-built inbox that their AI agent can own end to end, that is a meaningful gap.

This tutorial fills it. By the end, your MCP-compatible AI client will be able to create an inbox, send email from it, read replies, and check delivery events.


What You Will Build

An AI agent workflow backed by a real mailbot inbox. Your AI client exposes 13 MCP tools that map directly to mailbot's API surface: inbox management, message sending, reply handling, thread reading, and delivery event inspection. You type a natural language instruction, and the client calls the right tool.

This is useful for agentic tasks like: send a follow-up to anyone who replied to yesterday's campaign, check whether my outbound message was delivered, or create a throwaway inbox for this test scenario and clean it up when done.


Prerequisites

Before you start:

  • Node.js 18 or later installed on your machine (the MCP server runs via npx)
  • An MCP-compatible AI desktop client that supports external MCP servers via a JSON config file
  • A mailbot account and API key from getmail.bot

No local build step required. The package ships prebuilt to npm.


Step 1: Understand How MCP Servers Work

MCP (Model Context Protocol) lets an AI client call external tools in the same way a developer calls an API. According to the official MCP documentation, servers expose tools as typed functions. When you send a message to your AI client, it inspects the available tools, decides which one matches your intent, and executes it. The result comes back as context for the next response.

For email, this means your AI client becomes a first-class email actor rather than a text generator that happens to mention email addresses. It can actually create inboxes, send messages, and read what comes back.


Step 2: Install the mailbot MCP Server

No manual install is required. The package runs on demand via npx, so your AI client fetches and executes it automatically on first launch.

The package is published at @yopiesuryadi/mailbot-mcp on npm. If you want to inspect the package before running it, you can pull it manually:

npx @yopiesuryadi/mailbot-mcp --help

This confirms the package resolves and prints the available tool list.


Step 3: Configure the MCP Server in Your AI Client

Your MCP-compatible AI client reads a JSON config file to discover external servers. The exact file location varies by client. Common locations:

OSTypical config path
macOS~/Library/Application Support/<ClientName>/config.json
Windows%APPDATA%\<ClientName>\config.json

Add the following block to your client's MCP servers config:

{
"mcpServers": {
"mailbot": {
"command": "npx",
"args": ["-y", "@yopiesuryadi/mailbot-mcp"],
"env": {
"MAILBOT_API_KEY": "mb_test_xxx"
}
}
}
}

Replace mb_test_xxx with your actual mailbot API key from your account dashboard.

Save the file and restart your AI client. If the client has a tools or connectors panel, you should see "mailbot" listed with its 13 available tools. That confirms the server is running and connected.

Note: the MCP server is at v1 and has not been tested across every AI client configuration. If your client does not surface the tools after restart, check that the config JSON is valid and that Node.js is accessible on your system PATH.


Step 4: Create an Inbox via MCP

Once the server is connected, you can talk to your AI client in plain language. To create a new inbox, try a prompt like:

Create a new mailbot inbox named "support-test"

Your AI client will call the create_inbox tool, which maps to client.inboxes.create in the mailbot SDK. The tool returns the inbox details including its assigned email address.

You can list existing inboxes with:

List my mailbot inboxes

And retrieve details for a specific one with:

Get the inbox with ID inbox_abc123


Step 5: Send Email via MCP

With an inbox created, sending is one instruction away:

Send an email from my support-test inbox to recipient@example.com with the subject "Hello from mailbot MCP" and a plain text body saying "This was sent by my AI agent."

The client calls the send_message tool under the hood. This is meaningfully different from send-only email MCP servers like Mailtrap's MCP integration, which only expose a single outbound send tool. With mailbot, the same session that sends can also receive and inspect.

You can also send HTML:

Send an HTML email from support-test to recipient@example.com. Subject: "Welcome". Body: a simple HTML welcome message with a bold heading.


Step 6: Receive and Read Email via MCP

When a reply arrives at your mailbot inbox, your AI client can read it:

List the latest messages in my support-test inbox

This calls list_messages and returns subject, sender, snippet, and thread ID for each message. To read a full message:

Get the full content of message msg_xyz789

To search across messages:

Search my support-test inbox for messages from sender@example.com

The search_messages tool accepts sender, subject keywords, date ranges, and label filters, so your agent can do targeted retrieval without reading the entire inbox.

If you are building an automated flow and need to wait for a reply before proceeding, the wait_for_message tool (backed by client.messages.waitFor) polls until a matching message arrives or a timeout is reached. This is useful for test flows where you send a message and need to assert on the reply.


Step 7: Check Delivery Events via MCP

Sending a message is the start, not the end. Your AI client can also inspect what happened to each message after delivery.

Check the delivery events for thread thread_abc123

This calls list_events for the thread, returning a timeline of events (queued, delivered, opened, bounced, and so on). You can also retrieve a single event:

Get event details for event evt_123

This is useful for agentic tasks like: "Send a follow-up only if the first message was delivered but not opened." Your agent can check the event timeline, make a conditional decision, and act without you writing any conditional logic manually.


Step 8: Organize with Labels and Threads via MCP

The 13 mailbot MCP tools also cover thread reading and label management. To view a full conversation thread:

Show me the full thread for thread_abc123

To label a message for downstream filtering:

Add the label "needs-followup" to message msg_xyz789

Labels work as lightweight state markers that persist on the message, so other tools or agents in your workflow can filter by them later.


What Is Next

This tutorial covered the core loop: create inbox, send, receive, inspect events. The mailbot MCP server exposes the same API surface as the SDK, so everything in the mailbot documentation applies to what your AI client can do.

A few directions to explore from here:

  • Event notifications: Set up a webhook to push inbound messages to your own endpoint, so your agent reacts in real time rather than polling.
  • Domain verification: Verify a custom sending domain so outbound messages use your own address.
  • Compliance checks: Use the compliance tools to run readiness checks before sending to a new list.

The MCP integration is v1. Feedback from real usage is how it improves. If you run into edge cases with your specific AI client configuration, the documentation is the right place to start: getmail.bot/docs/getting-started.


Sources