In a previous post we explored how our application tracks sent emails. We set things up to update or insert a record for each email with the current state (delivered, dropped, etc). In this post we’ll look at an alternative data model where we’ll store each event as its own record and then roll up the events to get the current state of a particular email.
With this different modeling we’ll be able to understand the full lifecycle of an email allowing us to debug issues better as well as have a better understanding of how our users interact with our emails. In our original setup we could only know about the most recent email event. We’ll use the concept of event sourcing to maintain this history.
Our Setup
Here’s what our table will look like:
CREATE TABLE emails (
id SERIAL PRIMARY KEY,
message_id varchar NOT NULL,
status varchar NOT NULL,
created_at timestamp DEFAULT current_timestamp,
updated_at timestamp DEFAULT current_timestamp
);
CREATE INDEX ON emails (message_id);
Here is some data to get us started.
INSERT INTO emails VALUES (
DEFAULT, 'first-message', 'sent', now(), now()
);
INSERT INTO emails VALUES (
DEFAULT, 'first-message', 'delivered', now(), now()
);
INSERT INTO emails VALUES (
DEFAULT, 'first-message', 'opened', now(), now()
);
INSERT INTO emails VALUES (
DEFAULT, 'second-message', 'sent', now(), now()
);
We have two emails being tracked. One has been sent, delivered, and opened; the other has only been sent, but not yet delivered or opened.
Check out this SQLFiddle if you’d like to explore the data or queries yourself.
Query for a Single Message
Given we have a single message_id
('first-message'
, for example) and we want to find its current state, we could write a query like this one:
SELECT
message_id,
status
FROM emails
WHERE message_id = 'first-message'
ORDER BY created_at DESC
LIMIT 1;
Our output looks like:
message_id | status
-----------------------
first-message | opened
Query Across All Messages
If we wanted to get the current state of all the emails, we would need to write a different query than we’re used to. We still want the most recent event but now we want it across a list of emails. As a result we want to get a single event (the most recent one) for each message. We can use Postgres’ DISTINCT ON
notation to do just that.
SELECT DISTINCT ON (message_id)
message_id,
status
FROM emails
ORDER BY message_id, created_at DESC;
Note: this query has been updated from the original post to be simpler as a result of feedback on Hacker News.
Our output looks like:
message_id | status
-----------------------
first-message | opened
second-message | sent
With this kind of data model and queries, we can now understand the current state of the world as well as interrogate our system to understand all the statuses through which an email went.
Further Reading
- Martin Fowler’s event sourcing post
- Explore refactoring this query into an ORM layer
- Learn about using
EXPLAIN ANALYZE
to understand performance of this kind of query - If this stuff is interesting to you and you’d like to chat, drop us a line: engineering@backerkit.com