-
Notifications
You must be signed in to change notification settings - Fork 24
Data Visualization with Superset
Apache Superset is a free, kick-ass tool to visualize and explore data. This guide will get you started with a bare-bones setup. Superset is awesome in that it is 'infinity scalable'. However, in this tutorial we're going with local hosting for ease of use. Superset comes with a Postgres database by default, so we're going to lean on that. With that in mind, this tutorial will work for an external Postgres / MySQL database too. We're using version 1.0.1 (released 02/2021), so this might need to be updated for future versions of Superset.
This guide assumes you have Docker and docker-compose
installed. I've tested this on Linux and OSX. I could not get this functioning with Windows. From the Superset docs:
Superset is not officially supported on Windows unfortunately.
The best option for Windows users to try out Superset locally is to install an Ubuntu Desktop VM
- Follow the Superset docker documentation to run the docker-compose locally.
- Create the database. Run this from the terminal.
docker exec -i superset_db psql -U superset -c "CREATE DATABASE nba;"
- Create a connection for Superset.
- Follow these docs.
- Use this connection string:
postgresql+psycopg2://superset:superset@superset_db:5432/nba
.
- Load our data.
- Clone this repo (or download the .zip).
- Modify the
scripts/create_postgres.sh
file to change the following environment variables.DB_NAME="nba"
DB_HOST="localhost"
DB_USER=superset
DB_PASSWORD=superset
- Run the script!
- Follow the regular Superset documentation on how to setup databases and datasets, following the schema provided.
Keep in mind, when you build queries in Superset you shouldn't 'pre-aggregate'. Superset basically accepts a query as a view that it saves outside of our Postgres db, then does it's own aggregation. So make general queries that fetch a ton of rows, then do the SUM
, AVG
, or whatever inside of Superset.
Happy visualizing!