Skip to content

A package to analyse simulation study data and summarise the results

License

Notifications You must be signed in to change notification settings

audrey-b/simanalyse

Repository files navigation

simanalyse

Lifecycle: experimental R build status Codecov test coverage License: GPL3 CRAN status CRAN downloads

simanalyse is an R package to facilitate model comparisons and simulation studies.

To install the latest development version from GitHub

#install.packages("remotes")
remotes::install_github("audrey-b/simanalyse")

Demonstration

Simulate the Datasets

Simulate 100 datasets using the sims package.

library(simanalyse)
set.seed(10L)
params <- list(sigma = 2)
constants <- list(mu = 0)
code <- "for(i in 1:10){
          y[i] ~ dnorm(mu, 1/sigma^2)}"
sims <- sims::sims_simulate(code, 
                            parameters = params, 
                            constants = constants,
                            nsims = 100,
                            silent = TRUE)

For example, the first dataset is

sims[1]
#> $y
#>  [1]  1.50009615 -1.04870803 -0.09224249 -1.55116546 -1.26316050  2.48549233
#>  [7]  0.42241980  2.70828118  0.21842600 -0.78109651
#> 
#> $mu
#> [1] 0
#> 
#> an nlists object of an nlist object with 2 numeric elements

Analyse the Datasets

Conduct a Bayesian analysis of each dataset. Let’s analyze each dataset in “report” mode. By default, this mode runs 3 chains until the following two criteria are met: convergence based on r.hat <1.1 and a minimum effective sample size of 400. The first half of the chains is discarded as burnin. Chains are thinned to 4000 iterations each to preserve disk and memory usage. See ?sma_set_mode for other choices of modes and customization.

prior <- "sigma ~ dunif(0, 6)"
results <- sma_analyse(sims = sims,
                       code = code,
                       code.add = prior,
                       mode = sma_set_mode("report"))

For example, the posterior distribution obtained from the analysis of the first dataset is pictured below.

plot(results[[1]])

Derive New Parameters (if required)

Derive posterior samples for new parameters.

results.derived <- sma_derive(results, "var=sigma^2", monitor="var")

For example, the derived posterior distribution obtained for the first dataset is pictured below.

plot(results.derived[[1]])

The same transformation must be applied to the true parameter values for eventually evaluating the performance (e.g. bias) of the method for those new parameters,

params.derived <- sma_derive(params, "var=sigma^2", monitor="var")
print(params.derived)
#> $var
#> [1] 4
#> 
#> an nlist object with 1 numeric element

Summarise the Results of the Simulation Study

Evaluate the performance of the model using the 100 analyses:

sma_evaluate(results.derived, parameters=params.derived)
#>   term    bias cpQuantile      mse
#> 1  var 1.76198       0.92 9.410909

Several more performance measures are available and can be specified using the measures argument (see ?sma_evaluate for details). You may also create custom performance measures. The example below shows how to reproduce the results above with custom code.

sma_evaluate(results.derived,
             measures = "", 
             parameters = params.derived, 
             custom_funs = list(estimator = mean,
                                cp.low = function(x) quantile(x, 0.025),
                                cp.upp = function(x) quantile(x, 0.975)),
             custom_expr_b = "bias = estimator - parameters
                              mse = (estimator - parameters)^2
                              cpQuantile = ifelse((parameters >= cp.low) & (parameters <= cp.upp), 1, 0)")
#>   term    bias cpQuantile      mse
#> 1  var 1.76198       0.92 9.410909

Saving to Files

When running lengthy simulation studies, it is often preferable to save all the results to disk. You may do this by using the functions ending in files. By default, results are saved in your working directory unless the path argument is specified.

set.seed(10L)
sims::sims_simulate(code, 
                    parameters = params, 
                    constants = constants,
                    nsims = 100,
                    save = TRUE,
                    exists = NA)

sma_analyse_files(code = code,
                  code.add = prior,
                  mode = sma_set_mode("report"))

sma_derive_files(code="var=sigma^2", monitor="var")

sma_evaluate_files()

You may read specific files using the function sma_read_files. For example,

sma_read_files(sma_evaluate_files)
#>   term    bias cpQuantile      mse
#> 1  var 1.76198       0.92 9.410909

Note that this result is identical to the result we obtained without saving to files, using the same seed. The result is also identical when using parallelization, described below.

Parallelization

Parallelization is achieved using the future package.

To use all available cores on a local machine simply execute the following code before calling the package’s functions.

library(future)
plan(multisession)

Alternatively, if you wish to use a cluster computing environment (e.g. Compute Canada) with a workload manager (e.g. Slurm, Torque/PBS), simply execute the following code before calling the package’s functions.

library(future)
library(future.batchtools)

plan(batchtools_slurm, # for Slurm; see help(batchtools_template) for other options
     workers = 100, # maximum number of cores to be used at any time
     template = "batchtools.slurm.tmpl") # path to configuration file

For example, the following configuration file may be used with the Slurm manager:

#!/bin/sh

#SBATCH --time=00:5:00 ## walltime in hh:mm:ss (per dataset)
#SBATCH --ntasks=1 ## number of cores (per dataset)
#SBATCH --mem-per-cpu=1000 ## min memory per core in MB

## Export value of DEBUGME environment var to slave
export DEBUGME=<%= Sys.getenv("DEBUGME") %>
  
module load nixpkgs/16.09  gcc/7.3.0 r/4.0.2 ## load R
module load jags/4.3.0 ## load JAGS

Rscript -e 'batchtools::doJobCollection("<%= uri %>")'

The configuration file will be called for each dataset. Thus in this example, one new core (ntasks=1) will be requested for each dataset and the 100 datasets will be analyzed in parallel on a total of 100 cores (workers=100).

Contribution

Please report any issues.

Pull requests are always welcome.

Code of Conduct

Please note that the simanalyse project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

About

A package to analyse simulation study data and summarise the results

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages