This R repo is a development branch, the actively developed repo is in Python at https://github.com/neurodata/hyppo.
- Overview
- Repo Contents
- System Requirements
- Installation Guide
- Instructions for Use
- License
- Issues
- Citation
- Reproducibility
In modern scientific discovery, it is becoming increasingly critical to uncover whether one property of a dataset is related to another. The MGC
(pronounced magic), or Multiscale Graph Correlation, provides a framework for investigation into the relationships between properties of a dataset and the underlying geometries of the relationships, all while requiring sample sizes feasible in real data scenarios.
- R:
R
package code. - docs: package documentation.
- man: package manual for help in R session.
- tests:
R
unit tests written using thetestthat
package. - vignettes:
R
vignettes for R session html help pages.
The MGC
package requires only a standard computer with enough RAM to support the operations defined by a user. For minimal performance, this will be a computer with about 2 GB of RAM. For optimal performance, we recommend a computer with the following specs:
RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core
The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 [email protected] GHz) and internet of speed 25 Mbps.
This package is supported for Linux operating systems. The package has been tested on the following systems:
Linux: Ubuntu 20.04, 18.04
Mac OSX:
Windows:
Before setting up the MGC
package, users should have R
version 3.4.0 or higher, and several packages set up from CRAN.
the latest version of R can be installed by adding the latest repository to apt
:
sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
sudo apt-get update
sudo apt-get install r-base r-base-dev
which should install in about 20 seconds.
Users should install the following packages prior to installing mgc
, from an R
terminal:
install.packages(c('ggplot2', 'reshape2', 'Rmisc', 'devtools', 'testthat', 'knitr', 'rmarkdown', 'latex2exp', 'MASS'))
which will install in about 80 seconds on a recommended machine.
The mgc
package functions with all packages in their latest versions as they appear on CRAN
on October 15, 2017. Users can check CRAN snapshot for details. The versions of software are, specifically:
ggplot2: 2.2.1
reshape2: 1.4.2
Rmisc: 1.5
devtools: 1.13.3
testthat: 0.2.0
knitr: 1.17
rmarkdown: 1.6
latex2exp: 0.4.0
MASS: 7.3
If you are having an issue that you believe to be tied to software versioning issues, please drop us an Issue.
From an R
session, type:
require(devtools)
install_github('neurodata/r-mgc', build_vignettes=TRUE) # install mgc with the vignettes
require(mgc) # source the package now that it is set up
vignette("MGC", package="mgc") # view one of the basic vignettes
The package should take approximately 20 seconds to install with vignettes on a recommended computer.
Please see the vignettes for help using the package:
vignette("MGC", package="mgc")
vignette("Discriminability", package="mgc")
vignette("simulations", package="mgc")
Pseudocode for the methods employed in the mgc
package can be found on the arXiv - MGC in Appendix C (starting on page 30).
For citing code or the paper, please use the citations found in citation.bib.
All the code to reproduce any figures from https://arxiv.org/abs/1609.05148 is available here https://github.com/neurodata/mgc-paper.
Here, we describe how to reproduce the manuscript figures from the discriminability paper. To begin, clone this repository locally:
git clone https://github.com/neurodata/r-mgc.git
We assume that the directory r-mgc
placed locally on the system is <package_root>
. Note that all figures were stylized using Adobe Photoshop prior to submission.
-
Figure 1: Mini Sims Figure This figure demonstrates the behavior of discriminability, Fingerprinting, ICC/I2C2, and Kernel methods under a range of basic simulation settings in 1 dimension.
-
Figure 2: Multisim Figure This figure demonstrates the behavior of discriminability, ICC, and I2C2 under a variety of simulation benchmark settings. To execute the script with fresh data:
setwd('<package_root>/docs/discriminability/paper/simulations')
source('shared_scripts.R`)
Note: the scripts will automatically multithread, however, the simulation benchmarks take quite a while to execute (1.5 days on a 96 core machine with 1 TB of RAM).
Using the included bound, one sample, and two sample data, you can proceed to duplicate the figure by opening the R notebook simulation plots, and executing the script.
- Figure 3: 64 pipelines figure. To regenerate the source data for this portion of the manuscript, users can use the following two scripts from an R terminal:
setwd('<package_root>/docs/discriminability/paper/discr_computation')
# edit lines 17 and 18, and lines 210 and 211, and set to your local path where
# preprocessed brains are located
source('./real_data_driver.R') # runs the discriminability calculations
# edit lines 17 and 18, and lines 108 and 109, to the location of the
# preprocessed brains
source('./realdat_perm_testing.R') # runs the two sample testing
Again, the scripts will multithread, but can be expected to take approximately 3 days on a 96 core, 1 TB RAM machine.
To regenerate Figure 2 from the manuscript, users can execute the 64 Pipelines Figure notebook.
-
Figure 4: Marginalized Options Comparison Users can regenerate the figure by using the notebook Multi Modal Opts.
-
Figure 5: Effect Size Investigation Users can reproduce the data collected with:
setwd('<package_root>/docs/discriminability/paper/dcor_fig')
source('./dep_wt_driver.R')
Results can be expected to take 2 days on a 96 core, 1 TB machine.
To reproduce the figure, users can use the Effect Size Investigation notebook.