After implementing in-house solutions for data observability in our previous jobs, we decided to build one for analytics environments.
The first principle was to make it ‘analytics engineer first’. We wanted it to be as integrated as possible with the existing workflows in dbt, be part of the jobs, configuration and alerting that is already in place.
Building a dbt package was the obvious solution for ease of deployment, running as part of the existing operation, and writing all the outputs directly to the data warehouse. The existing method for testing data after it builds is dbt tests. The hard question was how to implement stateful monitoring as dbt tests, when “one of the greatest underlying assumptions about dbt is that its operations should be **stateless** and **idempotent**” (dbt documentation / understanding state).
We will talk about some of the creative solutions we came up with, cool things we added along the way like operational monitoring, and what we learned from users since we released it.
Also we will share some practices that made our development more reliable, like unit-tests for dbt macros and end-to-end testing with mock data.