Tool Evaluation Series: Superset

Chris Nguyen
5 min readJan 6, 2024

--

Apache Superset is an open-source data exploration and visualization platform for data analysis. It is maintained by the Apache Foundation and is well-regarded for its ease of use and rich visuals. I will evaluate Superset in terms of expanding data analytics capabilities to a more general audience — defined as the audience outside of data teams that may not be highly trained in data but want to use it to answer their own business questions without having to wait for the data team to answer them. Or in other words “Is Superset a good self-service tool?”.

The way I will do this is by defining a few criteria that I care about and using Superset to see how well it fulfills that criteria. The criteria will be:

  1. Setup & Maintenance: is the tool easy to set up, maintain, and learn?
  2. Useful Features: what are 3+ things that make this tool stand out?
  3. Cost: how expensive do I think this tool is? I define expensive as “Does this cost me as much as Tableau?”
  4. Audience Fit: do I think the tool fits my intended general audience?

1. Setup & Maintenance

Honestly, Superset was a nightmare to set up on my own! I actually couldn’t even do it on my personal machine for testing (M1 Macbook Air 2020 with 8GB RAM, 6GB dedicated to Docker). I tried to set it up locally using the latest Docker image (version 3.0) but it would always crash immediately. Going back to previous versions had the same result. There seem to be a lot of complaints about this instability when I look up online videos about installation and that Apache apparently doesn’t keep its repo clean. Superset also seems rather resource heavy just to get it up and running.

What I ended up doing was using Restack to run a stable version of Superset just to test it out. I’m only testing features for this review so a trial version was fine to use. In practice, I would rather host Superset on our own servers than go through a paid service just to get it running but now I doubt the stability of the tool. Users won’t like an unstable tool that potentially goes down often even if the visualizations are beautifully useful. Windows is also not officially supported if that is a major blocker for a general audience (although I think that only applies to installation).

1 out of 5 points

Oh my god, why won’t you work???

2. Useful Features

  • Rich Visualizations: Superset has a huge gallery of interactive visualizations by default. It seems pretty customizable as there seem to be plugins you can work with to use any Javascript library you want.
So many advanced charts included by default
  • SQL Lab: The SQL Lab is included for running queries before making visualizations out of them so users can inspect their results. There’s even a handy runtime counter there.
Run SQL to inspect results and see how long it takes
  • Customizability: Superset’s charts are highly customizable ones that you can arrange in tile format.
Lots of customizability
Tile format works great

Superset has a huge range of capabilities and customizations. I would rate it very highly in terms of features (as long as it works).

5 out of 5 points

3. Cost

Superset is open-source so you can freely fork it or run it on your own servers. You can’t really beat free in terms of cost.

5 out of 5 points

4. Audience Fit

Do I think Superset fits my general audience? It’s a tool that seems made by and for engineers and it shows. By that, I mean that Superset has a ton of features out of the box and I think it’s pretty impressive how many rich chart types are included BUT the UI seems very simple and minimal. To me, that just screams “an engineer made this”: tons of functionality but barebones design. That means it’s an engineering tool at heart and not for a general non-technical audience. It lacks the polish of other tools I evaluated. If it was just the data and engineering teams I was focusing on, I would rate this fit pretty high. But I’m not so it won’t be rated that high.

In addition, there are some complaints about lack of support amidst a small community and bad/outdated documentation. Superset isn’t Tableau or Power BI so it doesn’t have as huge of an audience (which means that community help may be harder to come by). Connectors also depend on Python API and SQLAlchemy ORM. I’m not sure how optimized that is compared to a native driver.

3 out of 5 points

Verdict

Superset feels like a very rich tool to me…as long as it’s working. I’m clearly still bitter over how many hours I spent just trying to get it to work locally (good thing I was severely jetlagged on holiday while I was playing around with these tools) while the other tools I was evaluating took a few minutes to set up. But if my target audience was more limited and I had a stable version of Superset, I would absolutely recommend it for all its capabilities. Even better if you have a team that can support and expand those features (and make up for potential lack of official support).

14 out of 20 points

--

--