Fresh or stale data?
Here at Unacast we have spent the last 6 months making a self-service data delivery management system for our Sales and Client Success people. This has become a necessity because of our skyrocketing data volumes and the steady increase of both data samples and client integrations. The front end solution we have implemented to solve this problem is only one part of a much larger microservice puzzle, where several independent apps handle the different steps in the data delivery pipeline. So how do we make the front reflect the changes in states (both successes and errors) that happens as the data is passing through the pipe?
We released a first version where this was handled in the easiest way thinkable, with a refresh button like this. This was not optimal in any way, as it raised a lot of "why hasn't this order been executed for the last 3 days" questions when people forgot to press the button to get to an updated state. It soon became clear that we had to fix this and make this refresh automatic, so we tested another simple solution; triggering the refresh function on an interval. With an increasing number of users, an increasing number of orders and shorter intervals between each order execution, this wasn't viable either.
So then we decided to do this the proper way; to refresh the UI only when the downstream systems had altered its state or an error had occurred.
How Does The Data Flow?
Here's a simplified diagram showing the different parts involved, how they communicate, and how the data flows.
As you can see it is all Firebase and Google Cloud components, and I'm now going to explain a bit more what happens here:
Firebase
The frontend is a React app hosted on Firebase and it uses a Firestore NoSQL database to store specifications of what we call Orders. An Order is a specification of what data should be shipped where, and at what interval.
Backend
When an Order is registered it is posted to a backend consisting of several services running on Google Kubernetes Engine. From this point on the Order is handled and executed by these systems. The backend uses a Cloud SQL database to store Events tied to the Order, along with the Order itself, as it alters state on its way through the pipeline. It is these Events that we want to trigger an automatic refresh in the Web frontend, so the user that issued the Order can follow the progress of the Order.
Pub/Sub
Every time the data delivery pipeline changes the state of an order, or an error occurs, a message is sent on a Topic on Pub/Sub. This way any system that is interested in reacting to a state change in an Order can have its own Subscription and receive messages with relevant data about these Events.
Cloud Functions
For our particular use case the subscriber to the State Change Topic is a Cloud Function, the Firebase variant, that leaps into action and grabs the Order's ID from the message body. It then uses the ID to fetch an up to date representation of the Order from the Cloud SQL database and uses this to update the Frontend's representation of the order in Firestore.
Firestore -> Firebase sync
Finally we arrive at the really neat part of this whole setup. The React components in our Frontend, that are coupled to and represent the data stored in documents in the Firestore database, will automatically be refreshed when that document is updated. No need for that pesky refresh button anymore!
Demo Time
Here is a short demo of how this looks in action, using a dummy Order that is scheduled to trigger every two minutes. As you can see we can follow it through all oif its states in real time from its creation to completion. No more wondering whether the Order you look at is up to date and showing its most current state.