Skip to main content

Why Your Email Tests Are the First Thing You Skip in CI

· 5 min read
Co-founder, mailbot

You have a CI pipeline that runs in under four minutes. Unit tests, integration tests, linting, type checks. Everything passes. Deploys go out multiple times a day.

Except for the email tests. Those were disabled three months ago because they kept timing out.

Nobody remembers who disabled them. Nobody has turned them back on. And nobody has noticed the two email bugs that shipped since then.

The anatomy of a flaky email test

Email tests fail differently than other tests. A database test either connects or it does not. An API test either returns the expected response or it throws. The failure is immediate, clear, and reproducible.

Email tests fail in ways that make you question your sanity.

The test sends a message, then polls an inbox waiting for it to arrive. Sometimes it arrives in one second. Sometimes in eight. Sometimes in thirty. Sometimes never. The test times out. You re-run the pipeline. It passes. No code changed. Nothing was fixed. It just worked the second time.

This is not a test problem. This is an email problem. Real SMTP delivery involves DNS lookups, connection negotiation, relay queues, receiving server processing, and content filtering. Any of those steps can introduce variable latency. When your test expects deterministic behavior from a system that is fundamentally nondeterministic, flakiness is guaranteed.

Why teams give up

The progression is predictable. Week one: the email tests are slow but they pass. Week two: one test fails intermittently. The developer re-runs CI and it passes. Week three: the test fails again. Someone increases the timeout from 10 seconds to 30. Week four: the test still fails occasionally. Someone adds a retry. Week six: the retry mask real failures and the test suite takes twice as long. Week eight: someone adds skip to the email tests with a comment that says "TODO: fix flaky email tests."

That TODO will never be resolved.

The team now ships email changes without automated verification. They test manually by sending an email and checking their inbox. If it looks right, they merge. The CI pipeline is green because it no longer tests email.

This is not a failure of discipline. It is a failure of tooling. The testing infrastructure was never built to handle the unique constraints of email delivery.

What makes email tests different

Email tests break the assumptions that make other tests reliable.

No instant feedback. HTTP request tests get a response in milliseconds. Email delivery can take seconds or minutes. There is no synchronous confirmation that the message was received, rendered correctly, and threaded properly.

External dependencies everywhere. A real email test depends on DNS, SMTP servers, relay infrastructure, and the recipient mail server. If any of those are slow, unreachable, or rate limiting, the test fails for reasons completely unrelated to your code.

State leaks between tests. Send ten test emails to the same address and you might trigger spam filters, rate limits, or greylisting. The first test passes. The tenth fails. Not because anything is wrong with the tenth test, but because the system has accumulated state that your test suite does not manage.

No assertion on delivery quality. Even if the email "arrives," you cannot easily assert that it rendered correctly across clients, that the headers are properly formed, that the threading works, or that the message was not flagged. Your test asserts "email sent." The real question is "email received and usable."

The sandbox approach

The solution is not to make flaky tests less flaky. It is to remove the external dependencies that cause flakiness in the first place.

A sandbox inbox is an email address that exists purely for testing. It accepts messages right away, without real SMTP delivery, DNS resolution, or content filtering. The message is available for inspection as soon as the API confirms acceptance.

import { MailBot } from '@yopiesuryadi/mailbot-sdk';

const mailbot = new MailBot({ apiKey: process.env.MAILBOT_API_KEY });

// Create a test inbox for this CI run
const testInbox = await mailbot.inboxes.create({
username: `ci-test-${Date.now()}`,
display_name: 'CI Test Inbox'
});

// Send a test message
await mailbot.messages.send({
inboxId: testInbox.id,
to: 'recipient@test.mailbot.id',
subject: 'Order Confirmation #12345',
body: '<h1>Your order is confirmed</h1><p>Tracking: ABC123</p>'
});

// Assert immediately — no polling, no waiting
const messages = await mailbot.inboxes.getMessages(testInbox.id);
assert(messages.length === 1);
assert(messages[0].subject.includes('#12345'));

No SMTP relay delays. No DNS lookups. No greylisting. The message is available through the API the moment it is sent. Your test runs in milliseconds, not seconds.

Each CI run gets its own inbox. No state leaks between test suites. No shared inbox where messages from different branches collide. No "was that message from my test or yours?"

Fast tests get run. Slow tests get skipped.

There is a simple truth about test suites: developers keep the tests that give them confidence and remove the tests that waste their time.

The reason email tests are the first to be disabled is not that they are less important. In many products, email is the most critical user-facing channel. It is that email tests are slower and less reliable than everything else in the pipeline. They are the weakest link, so they get cut first.

Make them fast and deterministic and they stay enabled. Keep them dependent on real SMTP infrastructure and they will be disabled within a month. Not because the team does not care about email quality, but because a four minute pipeline cannot afford a thirty second coin flip.

Your CI pipeline should test email the same way it tests everything else: quickly, reliably, and on every commit. If your current setup cannot do that, the problem is not your tests. It is your test infrastructure.