This repository contains Docker files which setup the Passive Data Kit server.
Online and mobile news consumption leaves digital traces that are used to personalize news supply, possibly creating filter bubbles where people are exposed to a low diversity of issues and perspectives that match their preferences. The JEDS Filter Bubble project aims to understand the filter bubble effect by performing deep semantic analyses on mobile news consumption traces. This project is a collaboration between the VU, the UvA and NLeSC, lead by Wouter van Atteveldt.
Part of this project makes use of the Passive Data Kit server, developed by Chris Karr. This server gathers browsing history from each participant in a database. The Docker files contained in this repository help setup the server as a Python Django app. See the Passive Data Kit documentation for additional information.
- Clone the repository and its submodule using
git clone --recurse-submodules
. - On your server, install
docker
,docker-compose
and a mail application (e.g.mailutils
orsendmail
, for Cron emails) - Make sure nothing is running at PostGres port 5432 (or kill it using
sudo kill $(sudo lsof -t -i:5432)
)
- Rename
variables.env.template
tovariables.env
and configure the variables - Run
chmod +x backuppostgres.sh
to allow Cron to execute the backup script
- Run
sudo docker-compose up -d
to build the Docker image and run the container - Run
sudo docker-compose exec django python manage.py createsuperuser
to setup the administrative access
- Run
sudo crontab crontab
to load thecrontab
file (the last argument is the filename) - Run
sudo service cron start
to start the Cron service
- Run
sudo docker ps
to list all containers which are running - Run
sudo docker exec -it [container-id] bash
to enter a container and execute bash commands - Run
sudo docker-compose down
to stop containers which are running - Run
sudo docker rm $(sudo docker ps -a -q)
to delete all Docker containers - Run
sudo docker rmi $(sudo docker images -q)
to delete all Docker images - Run
sudo service [name] status
to see if a service (e.g.cron
orsendmail
) is running - Run
sudo service [name] stop
to stop the service (e.g.cron
orsendmail
) - Run
sudo crontab -l
to list all current Cron jobs - Run
sudo crontab -e
to edit the current Cron file - Run
sudo crontab -r
to delete/reset all Cron jobs - Run
git submodule foreach git pull origin master
to pull the latest commit for each submodule.
This directory contains the Web Historian Server submodule which contains the Passive Data Kit configured for use with the Web Historian browser extension.
The backuppostgres.sh
contains a command which exports the content of the PostGres database to a file on the server. It starts overwriting old files after one month.
The crontab
contains a list of Cron jobs to be executes regularly. See the Passive Data Kit documentation for additional information. If any errors occur, these are emailed to the configured email address.
The Dockerfile
describes the Docker image that will be created for this application. The Passive Data Kit's development targets the latest Long Term Support (LTS) versions of Django. This Docker image contains Python 2.7. Passive Data Kit uses GeoDjango and the Postgres database's native GIS features to implement some location-based features used to support data quality monitoring and other analysis tools. This requires the python-pdal
package. When a data export is requested in the Passive Data Kit server dashboard, a file is generated and e-mailed to the requester. This requires a mail application such as the sendmail
package. The local settings, which are not included in the Passive Data Kit submodule and which contain information specific to the current setup, are copied into the main Django directory. Finally, all requirements, including Django version 1.11.20, are installed.
The docker-compose.yml
file describes the structure of the Docker containers. This includes the Django app, the PostGres database, the nginx web server and the Let's Encrypt SSL certificates service. When the Django app is first run, a command is executed which i.a. runs the Django migration.
The local settings contain some specific settings for Django which are specific to the current setup. The file works in conjunction with variable.env
which contains the actual variables.
This configuration file tell nginx how to set up the server for the Django app.
This template should be renamed to variable.env
and edited to include the relevant user names, host names and passwords.