Your Email Tests Pass. Your Emails Don't.
Your CI pipeline is green. Every email test passes. Coverage looks solid.
Then a user reports they never got their password reset. Another says the welcome email landed in spam. A third says the magic link expired before it arrived.
Your tests said everything was fine. Your tests were wrong.
What your mock actually tests
Most email tests follow the same pattern. You stub the email client, call the function that sends a message, and assert that the stub was invoked with the right arguments. Template rendered? Check. Recipient correct? Check. Subject line matches? Check.
The test passes. You move on.
But here is the problem. That test verified your code called a function. It did not verify that an email was composed, transmitted over SMTP, accepted by a receiving server, passed authentication checks, survived content filters, and landed in a human inbox. Those are different things entirely.
Mocking the send call is like testing a restaurant by confirming the kitchen received the order. You have no idea whether the food arrived at the table, whether it arrived cold, or whether the waiter delivered it to the wrong room.
The failures mocks cannot catch
Real email failures happen in the space between your API call and the recipient's inbox. That space is vast, unpredictable, and entirely absent from your test environment.
DNS resolution failures. Your code sends to user@company.com. In production, the MX lookup for company.com returns nothing. Or it returns a server that refuses connections. Your mock never performs DNS resolution, so this failure is invisible.
Authentication rejections. SPF alignment requires the sending IP to match the domain's DNS records. DKIM requires a valid cryptographic signature on the message headers. DMARC ties them together. If your sending infrastructure changes, or your DNS records drift, messages get rejected or quarantined. A mock knows nothing about SPF, DKIM, or DMARC.
Content filtering. Spam filters evaluate the entire message: headers, body structure, link reputation, sending domain age, engagement history. A test that checks whether your template renders correctly tells you nothing about how Gmail or Outlook will score that message.
Greylisting. Many receiving servers reject the first delivery attempt from an unknown sender, expecting the sender to retry. If your infrastructure retries from a different IP, the greylist never clears. The email arrives hours late or never. Your mock returned success right away.
Rate limiting. Production email providers enforce sending limits per minute, per hour, per domain. Hit the ceiling at 2 AM when a batch job fires, and half your emails queue indefinitely. Your mock has no concept of throughput constraints.
Bounce handling. Hard bounces, soft bounces, deferred delivery, mailbox full responses. Each requires different retry logic. Your mock returns void.
None of these produce a test failure. Every one of them produces a user who never got your email.
Why the feedback loop is broken
The deeper issue is not that mocks are bad. It is that email has no fast feedback loop.
When a web request fails, you get a stack trace. When a database query is slow, you see it in your metrics. When an email fails, you see nothing. The API returns 200. The provider says "accepted." And then silence.
This creates a dangerous confidence gap. Your tests pass because the mock cooperates. Your monitoring shows no errors because the send call succeeded. The failure happens outside your system, in SMTP relays, receiving servers, and spam filters that you do not control and cannot observe.
By the time someone reports "I never got the email," you have no trail to follow. You check the logs. Sent. You check the provider. Delivered. The user checks spam. Not there.
Without visibility into what happens after the send, you are debugging with a blindfold.
What real email testing looks like
Testing email properly means testing the actual delivery path, not a simulation of it.
Use real inboxes for integration tests. Instead of mocking, send real messages to inboxes you control and verify they arrive. This catches DNS issues, authentication failures, and formatting problems that mocks hide.
Inspect the full event timeline. A message that was "sent" can still bounce, get deferred, or land in spam. You need visibility into every state transition: accepted, delivered, opened, bounced, complained. mailbot exposes this as a timeline of events per message, so you can assert against what actually happened, not just what your code intended.
Test authentication in staging. SPF and DKIM are infrastructure concerns, not application concerns. But they break in application-visible ways. If your staging environment sends from a different IP or domain than production, your authentication tests are worthless. Run SPF and DKIM validation as part of your deployment pipeline, not just once during setup.
Simulate adversarial conditions. Real email environments include greylisting, rate limiting, and temporary failures. Your test suite should include scenarios where delivery is delayed, where the first attempt fails, where the recipient server is slow. If your tests only cover the happy path, they only prove the happy path works.
Monitor delivery in production, continuously. Testing is not only a pre-deploy activity. Email deliverability changes over time. Domain reputation shifts. Spam filter rules update. Content that passed last month might trigger filters today. Continuous monitoring of delivery events is the only way to catch regression after deploy.
The gap is between "sent" and "received"
Most teams test email the same way they test any other API call: mock the dependency, verify the contract, move on. This works for services where the contract is the entire interaction. But email is different. The contract between your app and the email API is a tiny fraction of the real interaction.
The real test is: did the intended human get the intended message at the intended time?
If your testing strategy cannot answer that question, your tests are passing but your emails are not.