-
Notifications
You must be signed in to change notification settings - Fork 24
Data Visualization with Superset
Apache Superset is a free, kick-ass tool to visualize and explore data. This guide will get you started with a bare-bones setup. Superset is awesome in that it is 'infinity scalable'. However, in this tutorial we're going with local hosting for ease of use. Superset comes with a Postgres database by default, so we're going to lean on that. With that in mind, this tutorial will work for an external Postgres / MySQL database too. We're using version 1.0.1 (released 02/2021), so this might need to be updated for future versions of Superset.
This guide assumes you have Docker and docker-compose
installed. I've tested this on Linux and OSX. I could not get this functioning with Windows. From the Superset docs:
Superset is not officially supported on Windows unfortunately.
The best option for Windows users to try out Superset locally is to install an Ubuntu Desktop VM
- Follow the Superset docker documentation to run the docker-compose locally.
- Test everything looks good by navigating to
localhost:8088
in your browser. The username / pass isadmin
. - Create the database. Run this from the terminal.
docker exec -i superset_db psql -U superset -c "CREATE DATABASE nba;"
- Create a connection for Superset.
- Go to the
Databases
tab in the browser. - Read these docs.
- Use this connection string:
postgresql+psycopg2://superset:superset@superset_db:5432/nba
.
- Go to the
- Load our data.
- Clone this repo (or download the .zip).
- Modify the
scripts/create_postgres.sh
file to change the following environment variables.DB_NAME="nba"
DB_HOST="localhost"
DB_USER=superset
DB_PASSWORD=superset
- Run the script! It'll take a while but we've included some progress bars to show that the script isn't being rate limited (a serious issue). If rate limiting is happening to you (it hangs indefinitely) you need to destroy the database and increase the
--time_between_requests
parameter.
- Follow the regular Superset documentation on how to setup databases and datasets, following the schema provided.
- Go to the
SQL Lab
->SQL Editor
tab. Enter the following query:
SELECT season_id, td3
FROM player_game_log
LEFT JOIN player ON player.player_id = player_game_log.player_id
WHERE player.player_name = 'Russell Westbrook';
- Click Run then Explore Data. You should then be able to chart Russell Westbrook's Triple-Doubles for each season he has played. 🎉
- Change Visualization Type from
Table
toBar Chart
. Metric should beSUM(td3)
and SERIES should beseason_id
. Remove the row limit at the bottom, then hit Run. A bar chart should appear.
Keep in mind, when you build queries in Superset you shouldn't 'pre-aggregate'. Superset basically accepts a query as a view that it saves outside of our Postgres db, then does it's own aggregation. So make general queries that fetch a ton of rows, then do the SUM
, AVG
, or whatever inside of Superset.
Happy visualizing!