A simple demo setting up metriql
Here’s a situation I run into often as a Tableau practitioner: I whip up a great and clever analysis in SQL that uses very precise steps to create a particular metric. Let’s just say it’s counting memberships across segments (hey, counting stuff is surprisingly hard in analytics). I use Tableau to create some great views and dashboards tracking the metric and put it online for users. But then the marketing department asks for it but in another tool. Or someone else wants to create additional member count metrics on top of what I created. Here’s the problem: all my custom metric logic is stuck in the initial Tableau dashboard. Any time a slight change occurs, I’m on the hook for making sure it’s updated in every single place the metric is defined or else I risk numbers being inconsistent. This is a nightmare scenario with clear tech debt headaches but is one that headless BI aims to solve.
Metriql offers a solution. Metriql lets you define metrics centrally so that they can be used by multiple tools without having to redefine every time. It provides a metadata layer of the data and is an extension of dbt Core. It is essentially an API layer that multiple tools can access. Just as the data warehouse is the single source of truth for data, metriql acts as the single source of definition of metrics. You can find out more by going to the website and joining the Slack channel. Here, I will describe how I got it working.
Setting up the tool took a bit of playing around with so I’m going to describe my process and how I got it to work. There is a starter tutorial for metriql but I’ll describe my own set up experience in more depth.
Prerequisites
I needed to install a few things before I could get metriql to work. (Currently using an M1 Macbook Air 2020.)
- Docker
- OpenJDK 11 (instructions on a Mac)
- dbt (I installed local version for this example)
- A database or BI tool (I used DBeaver and Google Data Studio)
- A data warehouse like BigQuery or Snowflake
- A dataset to work with (I reused the sample dataset from the dbt Learning tutorial, i.e. the jaffle shop dataset)
Setting up BigQuery
Setting up the dataset in the data warehouse and connecting to it locally took a few steps so I’ll specify what I did in BigQuery. If you’re using some other data warehouse, you’ll do very similar steps.
First, I needed data in the warehouse to access. The jaffle shop data is actually public data in GCP so you can access it directly without importing anything. I made a BigQuery project called dbt-training-314601 with a dataset under it called dbt_tutorial. Then I inserted tables in there using the queries:
create table `dbt-training-314601.dbt_tutorial.customers` as
select * from `dbt-tutorial.jaffle_shop.customers`;create table `dbt-training-314601.dbt_tutorial.orders` as
select * from `dbt-tutorial.jaffle_shop.orders`;
Next, I had to install gcloud on my machine so that these tables could be accessed in my demo project. gcloud is a part of Cloud SDK and needs to be initialized on the machine after installation using > gcloud init. This will create a google-cloud-sdk directory in the home folder and add paths to the .zshrc file. Always remember to test the connection. I had to ensure the IAM privileges were correct to be able to connect to my projects correctly.
Setting up dbt
Setting up dbt was also important to outline since I did it locally. dbt needs to initialized, be able to connect to BigQuery correctly, and has to be run to create models before metriql builds on top of it. This is a part of the profile setup outlined in the tutorial but essentially, I needed to initialize dbt (using > dbt init)and then go into the profiles.yml file in ~/.dbt to access the correct target environment (and of course test it!). An explanation of profile configuration is here. The > dbt run command was run to create the initial models and ensure that the environment could run and connect to the database correctly. Now, I was finally ready to start using metriql!
Metriql Demo Project
Metriql has to be installed and it can be used through the command line. There are 3 ways to install it using Docker, via binary, or from source. The recommended way to do it is by Docker so I downloaded the latest image from Docker Hub. I had some issues using BigQuery but it turns out that I needed to specify the volume when using the serve command to run the HTTP server and access the REST API when running locally (even got a note added to the current page!). Successfully running the image gives this page:
The actual command I used to run the server is:
export DBT_PROJECT_DIR=${PWD}
export DBT_PROFILES_DIR=${HOME}/.dbt
export METRIQL_PORT=5656docker run -it -p "${METRIQL_PORT}:5656" -v "${DBT_PROJECT_DIR}:/root/app" -v "${DBT_PROFILES_DIR}:/root/.dbt" -v "${HOME}/.config/gcloud:/root/.config/gcloud" -e METRIQL_RUN_HOST=0.0.0.0 -e DBT_PROJECT_DIR=/root/app buremba/metriql \
serve
There are quite a few connectors already in metriql so there are plenty of choices to showcase metriql in BI tools. But wait, I haven’t defined anything yet! Getting the server to run on an empty project was just the first step. I needed to build a data model and define some metrics that can be used by multiple BI tools.
Measures & Dimensions
I started out simple and defined some common properties using metriql. Measures and dimensions are standard properties when dealing with metrics (shameless plug for my article on defining this topic. I defined what they mean in Tableau but the basic definition applies for any tool that has measures and dimensions). To define this, I had to go into the /models directory and create some YAML files.
In my example project (Github link), I have two .sql files and two .yml files that I added to /models. The .sql files create models for the dbt project. In this case, the models will be tables sourced from dbt_tutorial tables created above. The models should be created and run first to ensure that they exist before adding metriql functionality.
The stg_jaffle_shop.yml file further defines the configuration of the models. The names of the models match up to the SQL filenames. This file allows you to put in a description for both the table and the columns and also set up standard dbt testing. Running dbt now would show that and create the corresponding documentation.
The meta property in this .yml file allows us to use metriql’s features. In my example, I have one measure and one dimension defined. The measure is just the total row count (should be 100 in this case) while the dimension is the concatenation of the first and last name. We’ll check this using a database tool. Metriql allows much more customized definitions of course but as a rule-of-thumb, start simple before you add complexity.
Aggregates
I also decided to try out the aggregation feature of metriql. Aggregates help summarize data by creating roll-up tables. This helps deal with huge datasets that are slow and/or expensive to use when you don’t need to know all the details, just the summary of the data. Aggregation roll-up tables are not exposed to the end user, so they can call it but not overwrite the definition. Metriql actually rewrites the query itself. That can be seen in the /metriql directory within /models after running the generate command in the CLI.
To generate aggregates, just run the same command used to spin up the local server but with “generate” in place of “serve”.
Aggregations are defined in the sources, models, or seeds inside the dbt project. I defined them in a source file here.
Querying metriql using DBeaver
I was now ready to start checking the metrics I had created using metriql. After making sure the server ran correctly, I used a database tool, DBeaver, to query the metrics to check expected results. According to the documentation, I can connect through the Trino/Presto integration. Here is what I ended up with once it was working:
Querying metriql using Google Data Studio
To really showcase the benefit of metriql though, I needed to use another BI tool to ensure that the metrics stay the same no matter what tool I use. I decided to use Google Data Studio since anyone with GCP also has access to that. Data Studio is a little more involved than DBeaver though since I needed to expose my local port to the internet so that Data Studio could access it. For that, I needed to use a tunneling service to expose the port. I used ngrok since I was already somewhat familiar with it as the current gold standard of tunneling tools. So I downloaded it, followed the steps in the documentation, and was able to verify that the Data Studio tool could access my defined metrics and that they exactly match what I found in DBeaver! I’ll outline the steps below (metriql documentation here as there are some limitations to be aware of).
Final Thoughts
Well, that was a wild ride! Setting up and defining metrics using metriql is of course very involved and cobbles together several different skills and tools. It’s not as simple as just starting up a Tableau workbook, connecting to a database, and pasting in some custom SQL code. But with that simplicity comes potential vendor lock-in. You could use Tableau’s Extract API to access Hyper extracts but…you still have to pay to use Tableau for that haha.
Metriql offers future flexibility. What if someday I wanted to stop paying for Tableau or Looker licenses because the organization wants to switch to some other tool? Moving that custom SQL code for every single workbook and recreating the analysis sounds like an absolute nightmare to keep track of! But if the metrics were already defined elsewhere and just accessed through an API, that switch becomes magnitudes of order easier.
My metric definitions here are very simplistic. This was intentional as the focus was on getting basic functionality of metriql up and running first. Metriql was originated from Rakam, which is a product analytics tool that specializes in providing segmentation, funneling, retention, and more. As a result, metriql natively provides that functionality and you can define very sophisticated metrics based on those capabilities.
The metrics metadata layer is an idea picking up steam. In addition to metriql, there are other forms or instances of this popping up such as Minerva at Airbnb, Lightdash, and even dbt itself is looking at metric definition support. Headless BI tools seem to be picking up steam as metrics are optimized and defined globally rather than locally in tools. As another addition to the modern data stack, metriql is worth keeping an eye on!