Skip to content

Data Visualization with Superset

Matthew Pope edited this page Mar 7, 2021 · 10 revisions

Superset Setup Guide

Apache Superset is a free, kick-ass tool to visualize and explore data. This build will get you started with a bare-bones setup. Superset is awesome in that it is 'infinity scalable'. However, in this tutorial we're going with local hosting for ease of use.

This guide assumes you have Docker and docker-compose installed. I've tested this on Linux and OSX. I could not get this functioning with Windows. From the Superset docs:

Superset is not officially supported on Windows unfortunately. The best option for Windows users to try out Superset locally is to install an Ubuntu Desktop VM

Setup Superset

  1. Follow the Superset docker documentation to run the docker-compose locally.
  2. Create the database. Run this from the terminal.
    1. docker exec -i superset_db psql -U superset -c "CREATE DATABASE nba;"
  3. Create a connection for Superset.
    1. Follow these docs.
    2. Use this connection string: postgresql+psycopg2://superset:superset@superset_db:5432/nba.
  4. Load our data.
    1. Clone this repo (or download the .zip).
    2. Modify the scripts/create_postgres.sh file to change the following environment variables.
      1. DB_NAME="nba"
      2. DB_HOST="superset_db"
      3. DB_USER=superset
      4. DB_PASSWORD=superset
    3. Run the script!
  5. Follow the regular Superset documentation on how to setup databases and datasets, following the schema provided.

Keep in mind, when you build queries in Superset you shouldn't 'pre-aggregate'. Superset basically accepts a query as a view that it saves outside of our Postgres db, then does it's own aggregation. So make general queries that fetch a ton of rows, then do the SUM, AVG, or whatever inside of Superset.

Happy visualizing!

Clone this wiki locally