Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft submission to Journal of Open Source Software #142

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/workflows/joss.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Compile JOSS paper draft
on: [push]

jobs:
paper:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Build draft PDF
uses: openjournals/openjournals-draft-action@master
with:
journal: joss
paper-path: doc/joss-2024/paper.md
- name: Upload
uses: actions/upload-artifact@v4
with:
name: paper
path: doc/joss-2024/paper.pdf
Empty file added doc/joss-2024/paper.bib
Empty file.
55 changes: 55 additions & 0 deletions doc/joss-2024/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: 'genno: Efficient, transparent calculation on N-dimensional data'
tags:
- Python
- energy
- transportation
authors:
- given-names: Paul Natsuo
surname: Kishimoto
orcid: 0000-0002-8578-753X
affiliation: 1
affiliations:
- name: International Institute for Applied Systems Analysis
index: 1
ror: 02wfhk785
date: 5 September 2024
bibliography: paper.bib
---

# Summary

Research in the fields of energy and transport systems, including integrated assessment modeling of energy and climate policies, often requires complicated manipulations of input and output data that are *multi-dimensional*, *labeled*, *sparse*, and represent many measurable quantities with multiple units of measurement.

# Statement of need

Code for handling such data can be fragile and opaque, which makes the validation, reproduction and extension of research difficult.
In particular, adapting to new and revised input data, or refining model scenarios can involve extensive refactoring.

`genno` is a Python package that builds on `dask`, `pandas`, and `xarray` to provide an API for transparent description and efficient execution of operations on multi-dimensional data.

# Implementation and usage

`genno` extends the `dask` directed, acyclic graph (DAG) data structure, which describe tasks and their inputs using Python types.
While in `dask` these are used to distribute operations across multiple processes and nodes, in `genno` they are used to encode the data flow and manipulations in calculations expressed by the user.

The user first *prepares* a ‘Computer’ containing description of many tasks, and then *executes* one or more tasks.
`genno`, via `dask`, implements

In preparing calculations, the user may use *keys* that allow concise but unambiguous reference to quantities to be computed.

`genno` includes a large and growing library of fundamental *operators*, from which more complicated operations can be built up.

# Example applications

Two applications are described:

1. Input data preparation for the MESSAGEix-Transport model.

2. Integrated assessment modeling workflows.

# Acknowledgements

Many colleagues contributed to the initial design requirements for `genno`, including…

# References