Skip to content

Standard reduction pipeline

K.-Michael Aye edited this page Jul 2, 2014 · 2 revisions

Action summary (for the impatient)

Note: This procedure currently creates approx 5 GB of data on top of the downloaded file size.

  • Download data dump xxx.csv.tar.gz from the email you get every Sunday (if not, contact Meg).
  • on Mac, the Archive utility should properly unpack it, on linux:
    • tar zxvf yyyy_mm_dd_xxxx.csv.tar.gz
  • clone this repository. (see here how.)
  • cd into P4_sandbox/planet4
  • python reduction.py path_to_csv_file The argument is the full path to the unpacked CSV file.

Additional help

You need a current Python environment that includes the following modules:

  • pandas
  • PyTables (tables)
  • scipy / numpy

I can recommend the Anaconda distribution from Continuum Analytics, it contains extra features for academic users. But I have also used Enthought's Canopy successfully for years, just on Linux I don't like the hoops one has to go through for a multi-users installation.

What it does

It will create both the queryable and fast-read HDF5 database files in the same folder where the given CSV file is stored.