Stop Sharing a Test Inbox
Your CI pipeline tests the API, the database, the auth flow, and the payment integration. Then it skips email because nobody wants to deal with the test inbox.
Here's what email testing looks like at most companies. There's one shared inbox. Maybe it's a Mailtrap sandbox. Maybe it's test@company.com routed to a group mailbox. Maybe it's a MailHog instance someone spun up two years ago that nobody remembers how to restart.
Every test that triggers an email sends to this inbox. Password resets. Welcome emails. Invoice notifications. Onboarding sequences. All of them, from every developer, from every branch, from every CI run, landing in one place.
The shared inbox problem
If you've worked on a team that tests email, you already know how this goes.
Developer A triggers a password reset email as part of an integration test. Developer B triggers a welcome email from a different branch. Both emails arrive in the shared sandbox. Developer A's test picks up Developer B's email because it arrived first. The assertion fails. The test is flaky. Nobody knows why.
This isn't a rare edge case. This is what happens every time two email tests run concurrently against the same inbox.
The usual workaround is to add waits and filters. Poll the inbox. Check the subject line. Check the sender. Check the timestamp. If the email doesn't match, wait and poll again. If it still doesn't match, fail with a timeout. You've now turned a simple assertion ("did the email arrive?") into a polling loop with string matching and race conditions.
Some teams solve this by running email tests sequentially instead of in parallel. That works until your test suite takes forty minutes because email tests have become the bottleneck. Other teams give up and skip email tests entirely, relying on manual QA before releases.
Neither of these is an actual solution.
What email testing should look like
Think about how you test every other integration in your stack.
Database tests get their own test database, or at least an isolated transaction that rolls back after the test. API tests mock external services or spin up test containers. Auth tests use dedicated test credentials.
The principle is the same in every case: each test gets its own isolated environment. No shared state. No collisions. No flaky failures from concurrent access.
Email should work the same way.
Each test should create its own inbox. Trigger the email. Assert on the result. Tear down the inbox. No coordination with other tests. No shared sandbox. No filtering by subject line to figure out which email belongs to which test.
The reason this hasn't been standard practice is that creating an email inbox used to require DNS configuration, SMTP credentials, and manual setup. You don't spin up a new Gmail account per test run. That's why teams settled for shared sandboxes, because creating isolated inboxes was too expensive in setup time and operational overhead.
Programmable inboxes change the math.
What this looks like in practice
In mailbot, an inbox is a resource you create and destroy through the API. No DNS changes. No admin console. One call, one inbox.
Here's how a CI/CD email test works:
Before the test: create an inbox.
curl -X POST https://api.mailbot.id/v1/inboxes \
-H "Authorization: Bearer mb_live_xxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"username": "test-run-8472",
"display_name": "CI Test Run #8472"
}'
The response gives you a real email address. Something like test-run-8472@agent.yourdomain.com. This inbox can send and receive. It belongs to this test run and nothing else.
During the test: trigger the email and verify.
Your application sends a password reset email to that inbox address. Then your test queries the inbox for messages:
curl https://api.mailbot.id/v1/messages?inbox_id=inbox_01hrz... \
-H "Authorization: Bearer mb_live_xxxxxxxxxxxx"
The messages endpoint returns every email received by that inbox. In an isolated test, there's only one message. No filtering by subject. No polling through a shared mailbox. No collision with another test's emails.
After the test: clean up.
The inbox can be deleted, or left to expire. Either way, it doesn't accumulate test emails that pollute the next run.
The entire flow in a test file:
test('password reset sends email with valid link', async () => {
// Create isolated inbox for this test
const inbox = await mailbot.inboxes.create({
username: `test-${Date.now()}`,
display_name: 'Password Reset Test'
});
// Trigger the password reset in your application
await app.requestPasswordReset(inbox.address);
// Wait for the email to arrive
const messages = await pollForMessages(inbox.id, { timeout: 10000 });
// Assert
expect(messages).toHaveLength(1);
expect(messages[0].subject).toContain('Reset your password');
expect(messages[0].body_html).toMatch(/https:\/\/app\.example\.com\/reset\?token=/);
// Clean up
await mailbot.inboxes.delete(inbox.id);
});
No shared state. No race conditions. No flaky tests from concurrent runs. Every test is independent.
Beyond "did the email arrive?"
An isolated inbox unlocks assertions that are difficult or impossible with a shared sandbox.
Threading verification. If your application sends multi-step email sequences (welcome email, then follow-up, then onboarding prompt), you can verify that the entire thread is intact. Query the thread endpoint and check that all messages are in the right order, with correct threading headers.
curl https://api.mailbot.id/v1/threads?inbox_id=inbox_01hrz... \
-H "Authorization: Bearer mb_live_xxxxxxxxxxxx"
Reply testing. Send a reply back into your application and verify the response. Your test inbox isn't just a receiver. It's a participant. It can send email into your system the same way a real user would.
curl -X POST https://api.mailbot.id/v1/messages/send \
-H "Authorization: Bearer mb_live_xxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"inbox_id": "inbox_01hrz...",
"to": ["support@yourapp.com"],
"subject": "Re: Your password has been reset",
"body_text": "I did not request this reset."
}'
Now your test can verify that the application handles this reply correctly. Escalation triggered? Ticket created? Automated response sent? All testable because the inbox works in both directions.
Engagement tracking. If your test needs to verify that delivery tracking is functioning, the event timeline on each message shows whether the email was delivered, bounced, or accepted by the recipient server. This is useful for testing that your delivery monitoring actually works, not just that the email was sent.
The CI/CD integration
The most practical pattern for CI/CD is to create inboxes as part of the test setup and delete them in teardown. This maps directly to how test databases and test containers work.
For teams using GitHub Actions, GitLab CI, or any pipeline tool:
- Store the mailbot API key as a CI secret
- In your test setup, create inboxes for each test that needs email
- Run your tests against those inboxes
- In teardown, delete the inboxes
No special infrastructure. No MailHog container to maintain. No Mailtrap sandbox limits to worry about. No shared state between pipeline runs.
The inboxes are real. They send and receive through production-grade infrastructure. Your email tests aren't simulating email behavior in a sandbox. They're testing actual email delivery through the same system your production emails use. The difference is that each test gets its own isolated piece of that system.
The cost of not testing email
If you've decided that email testing isn't worth the effort, consider what happens when you don't test.
A developer changes a template. The password reset link breaks. Nobody catches it because the email isn't tested in CI. The change ships to production. Customers who need to reset their passwords get an email with a broken link. The support team gets flooded. Someone eventually traces it back to the template change, but by then it's been live for three days.
Or: a code change accidentally removes the Reply-To header from outbound emails. Customer replies stop routing to the right place. It takes a week for anyone to notice, because nobody is testing that replies work.
These aren't hypothetical scenarios. They happen because email is the one integration that most CI pipelines skip. And the reason they skip it is that the tooling has historically made it painful.
It doesn't have to be.
The principle
The reason shared test inboxes cause problems is the same reason shared test databases cause problems. Shared mutable state in a concurrent system produces unpredictable results.
The solution is the same too: isolation. One inbox per test. Created programmatically. Torn down automatically. No coordination required.
If your email infrastructure doesn't support programmatic inbox creation, you're stuck with workarounds. If it does, email testing becomes as routine as every other integration test in your pipeline.
That's the whole idea.
mailbot is programmable email infrastructure. Read the docs · Get API key