Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate interior state and boundary forcing to only predict state #93

Draft
wants to merge 168 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
5df1bff
add datastore_boundary to neural_lam
sadamov Nov 18, 2024
46590ef
complete integration of boundary in weatherDataset
sadamov Nov 18, 2024
b990f49
Add test to check timestep length and spacing
sadamov Nov 18, 2024
3fd1d6b
setting default mdp boundary to 0 gridcells
sadamov Nov 18, 2024
1f2499c
implement time-based slicing
sadamov Nov 18, 2024
1af1481
remove all interior_mask and boundary_mask
sadamov Nov 19, 2024
d545cb7
added gcsfs dependency for era5 weatherbench download
sadamov Nov 19, 2024
5c1a7d7
added new era5 datastore config for boundary
sadamov Nov 19, 2024
30e4f05
removed left-over boundary-mask references
sadamov Nov 19, 2024
6a8c593
make check for existing category in datastore more flexible (for boun…
sadamov Nov 19, 2024
17c920d
implement xarray based (mostly) time slicing and windowing
sadamov Nov 20, 2024
7919995
cleanup analysis based time-slicing
sadamov Nov 21, 2024
9bafcee
implement datastore_boundary in existing tests
sadamov Nov 19, 2024
ce06bbc
allow for grid shape retrieval from forcing data
sadamov Nov 21, 2024
884b5c6
rearrange time slicing, boundary first
sadamov Nov 21, 2024
5904cbe
identified issue, cleanup next
leifdenby Nov 25, 2024
efe0302
use xarray plot only
leifdenby Nov 26, 2024
a489c2e
don't reraise
leifdenby Nov 26, 2024
242d08b
remove debug plot
leifdenby Nov 26, 2024
c1f706c
remove extent calc used in diagnosing issue
leifdenby Nov 26, 2024
cf8e3e4
add type annotation
leifdenby Nov 29, 2024
85160ce
ensure tensor copy to cpu mem before data-array creation
leifdenby Nov 29, 2024
52c4528
apply time-indexing to support ar_steps_val > 1
leifdenby Nov 29, 2024
b96d8eb
renaming test datastores
sadamov Nov 30, 2024
72da25f
adding num_past/future_boundary_step args
sadamov Nov 30, 2024
244f1cc
using combined config file
sadamov Nov 30, 2024
a9cc36e
proper handling of state/forcing/boundary in dataset
sadamov Nov 30, 2024
dcc0b46
datastore_boundars=None introduced
sadamov Nov 30, 2024
a3b3bde
bug fix for file retrieval per member
sadamov Nov 30, 2024
3ffc413
rename datastore for tests
sadamov Nov 30, 2024
85aad66
aligned time with danra for easier boundary testing
sadamov Nov 30, 2024
64f057f
Fixed test for temporal embedding
sadamov Nov 30, 2024
6205dbd
pin dataclass-wizard <0.31.0 to avoid bug in dataclass-wizard
leifdenby Dec 2, 2024
551cd26
allow boundary as input to ar_model.common_step
sadamov Dec 2, 2024
fc95350
linting
sadamov Dec 2, 2024
01fa807
improved docstrings and added some assertions
sadamov Dec 2, 2024
5a749f3
update mdp dependency
sadamov Dec 2, 2024
45ba607
remove boundary datastore from tests that don't need it
sadamov Dec 2, 2024
f36f360
fix scope of _get_slice_time
sadamov Dec 2, 2024
105108e
fix scope of _get_time_step
sadamov Dec 2, 2024
d760145
Merge branch 'feat/boundary_dataloader' of https://github.com/sadamov…
sadamov Dec 2, 2024
ae0cf76
added information about optional boundary datastore
sadamov Dec 2, 2024
9af27e0
add datastore_boundary to neural_lam
sadamov Nov 18, 2024
c25fb30
complete integration of boundary in weatherDataset
sadamov Nov 18, 2024
505ceeb
Add test to check timestep length and spacing
sadamov Nov 18, 2024
e733066
setting default mdp boundary to 0 gridcells
sadamov Nov 18, 2024
d8349a4
implement time-based slicing
sadamov Nov 18, 2024
fd791bf
remove all interior_mask and boundary_mask
sadamov Nov 19, 2024
ae82cdb
added gcsfs dependency for era5 weatherbench download
sadamov Nov 19, 2024
34a6cc7
added new era5 datastore config for boundary
sadamov Nov 19, 2024
2dc67a0
removed left-over boundary-mask references
sadamov Nov 19, 2024
9f8628e
make check for existing category in datastore more flexible (for boun…
sadamov Nov 19, 2024
388c79d
implement xarray based (mostly) time slicing and windowing
sadamov Nov 20, 2024
2529969
cleanup analysis based time-slicing
sadamov Nov 21, 2024
179a035
implement datastore_boundary in existing tests
sadamov Nov 19, 2024
2daeb16
allow for grid shape retrieval from forcing data
sadamov Nov 21, 2024
cbcdcae
rearrange time slicing, boundary first
sadamov Nov 21, 2024
e6ace27
renaming test datastores
sadamov Nov 30, 2024
42818f0
adding num_past/future_boundary_step args
sadamov Nov 30, 2024
0103b6e
using combined config file
sadamov Nov 30, 2024
0896344
proper handling of state/forcing/boundary in dataset
sadamov Nov 30, 2024
355423c
datastore_boundars=None introduced
sadamov Nov 30, 2024
121d460
bug fix for file retrieval per member
sadamov Nov 30, 2024
7e82eef
rename datastore for tests
sadamov Nov 30, 2024
320d7c4
aligned time with danra for easier boundary testing
sadamov Nov 30, 2024
f18dcc2
Fixed test for temporal embedding
sadamov Nov 30, 2024
e6327d8
allow boundary as input to ar_model.common_step
sadamov Dec 2, 2024
1374a19
linting
sadamov Dec 2, 2024
779f3e9
improved docstrings and added some assertions
sadamov Dec 2, 2024
f126ec2
remove boundary datastore from tests that don't need it
sadamov Dec 2, 2024
4b656da
fix scope of _get_time_step
sadamov Dec 2, 2024
75db4b8
added information about optional boundary datastore
sadamov Dec 2, 2024
58b4af6
Merge branch 'feat/boundary_dataloader' of https://github.com/sadamov…
sadamov Dec 2, 2024
4c17545
moved gcsfs to dev group
sadamov Dec 3, 2024
a700350
linting
sadamov Dec 3, 2024
315aa0f
Propagate separation of state and boundary change through training loop
joeloskarsson Oct 28, 2024
1967221
Start building graphs with wmg
joeloskarsson Nov 4, 2024
cb74e3f
Change forward pass to concat according to enforced node ordering
joeloskarsson Nov 11, 2024
9715ed8
wip to make tests pass
joeloskarsson Nov 11, 2024
336fba9
Fix edge index manipulation to make training work again
joeloskarsson Nov 12, 2024
ce3ea6d
Work on fixing plotting functionality
joeloskarsson Nov 12, 2024
a520505
Linting
joeloskarsson Nov 13, 2024
793e6c0
Add optional separate grid embedder for boundary
joeloskarsson Nov 13, 2024
3515460
Make new graph creation script main and only one
joeloskarsson Nov 13, 2024
05d91f1
Fix some typos and forgot code
joeloskarsson Nov 13, 2024
3eba43c
Correct handling of node indices for m2g when using decode_mask
joeloskarsson Nov 27, 2024
f1b7359
Linting and bugfixes
joeloskarsson Nov 28, 2024
fa6c9e3
Make graph creation and plotting work with datastores
joeloskarsson Dec 2, 2024
4d85384
Fix graph loading and boundary mask
joeloskarsson Dec 2, 2024
9edfec3
Fix boundary masking bug for static features
joeloskarsson Dec 2, 2024
6e1c53c
Add flag making boundary forcing optional in models
joeloskarsson Dec 3, 2024
4bcaa4b
Linting
joeloskarsson Dec 3, 2024
16d5d04
Fixed issue with temporal encoding dimensions
sadamov Dec 3, 2024
f1f3f73
format docstrings
sadamov Dec 3, 2024
8fd7a10
introduced time slicing test for forecast type data
sadamov Dec 3, 2024
252a33c
bugfix temporal embedding dimension
sadamov Dec 3, 2024
8a9114a
linting
sadamov Dec 3, 2024
6afc50c
Get boundary static features from second datastore
joeloskarsson Dec 3, 2024
b062df4
Merge branch 'feat/boundary_dataloader' into boundary_forcing
joeloskarsson Dec 3, 2024
deb3338
Compute boundary forcing dimensions separately
joeloskarsson Dec 3, 2024
8c7709a
switched to low-res data
sadamov Dec 3, 2024
24cbf13
add datastore_boundary as explicit attribute
sadamov Dec 3, 2024
556b24b
Make graph creation and plotting work with dual datastore setup
joeloskarsson Dec 3, 2024
d802852
Merge branch 'feat/boundary_dataloader' into boundary_forcing
joeloskarsson Dec 4, 2024
7c382b8
Use lat-lons + crs for graph construction
joeloskarsson Dec 4, 2024
14d3912
Fix model constructor signatures
joeloskarsson Dec 4, 2024
ebfd0bd
Fix dataset issue in npy stat script
joeloskarsson Dec 4, 2024
698991f
Fix Inets not figuring out number of receiver nodes for g2m and m2g
joeloskarsson Dec 5, 2024
1d53ce7
fixing up forecast type data tests,
sadamov Dec 5, 2024
cfe1e27
time step can and should be retrieved in __init__
sadamov Dec 5, 2024
e4e4e37
Fix dataset issue in npy stat script
joeloskarsson Dec 4, 2024
3df3fcb
Merge remote-tracking branch 'mllam/main' into feat/boundary_dataloader
sadamov Dec 5, 2024
a0f229b
Merge branch 'feat/boundary_dataloader' into boundary_forcing
joeloskarsson Dec 5, 2024
29063c8
Adjust forcing dimensionalities after fix
joeloskarsson Dec 5, 2024
f8613da
added static feature to era5 boundary test datastore
sadamov Dec 5, 2024
c482a4d
Merge branch 'feat/boundary_dataloader' into boundary_forcing
joeloskarsson Dec 5, 2024
48558b5
Expand graph creation script with flexible python interface to wmg
joeloskarsson Dec 5, 2024
f0a7046
Merge remote-tracking branch 'mllam/main' into feat/boundary_dataloader
sadamov Dec 6, 2024
f48d2b0
Change graph creation test to use new script
joeloskarsson Dec 6, 2024
abc0a3f
Merge branch 'feat/boundary_dataloader' into boundary_forcing
joeloskarsson Dec 6, 2024
4044c09
Remove networkx dependency
joeloskarsson Dec 7, 2024
cb3787d
Use python 3.10 to be compatible with wmg
joeloskarsson Dec 7, 2024
a1f0f62
Start fixing tests
joeloskarsson Dec 7, 2024
797b867
Wrap up first version of new graph tests
joeloskarsson Dec 7, 2024
9a9bf91
Fix graph creation tests
joeloskarsson Dec 7, 2024
e77c87b
Save grpahs in temporary dir during testing
joeloskarsson Dec 7, 2024
b6949d3
Rescale static mesh node features with maximum grid coordinate again
joeloskarsson Dec 7, 2024
5ed304a
Change var names and comments to clarify difference between interior …
joeloskarsson Dec 7, 2024
e61bdfe
Make dir_save_path default to None
joeloskarsson Dec 10, 2024
ff0c8e0
Use in-place division for BufferList containing mesh graph node features
joeloskarsson Dec 17, 2024
8cc608d
rename function to represent multiple datastores
sadamov Dec 20, 2024
857f748
streamline da_grid_reference variable naming
sadamov Dec 20, 2024
d0a6f24
updated docstring of WeatherDataset
sadamov Dec 20, 2024
ef40a39
renamed da_boundary -> da_boundary_forcing
sadamov Dec 20, 2024
71b52b2
updated docstrings of get_dataarray()
sadamov Dec 20, 2024
b690563
check times in stateless functions from utils.py
sadamov Dec 20, 2024
a37dc3c
add num_ensemble_members property to BaseDatastore
sadamov Dec 20, 2024
8d1bec6
Update neural_lam/weather_dataset.py
sadamov Dec 20, 2024
47370f9
renaming time_diff_steps to time_deltas
sadamov Dec 20, 2024
7e1a246
Merge branch 'feat/boundary_dataloader' of https://github.com/sadamov…
sadamov Dec 20, 2024
d524377
add num_ensemble_members to mdp store
sadamov Dec 20, 2024
98c54d9
Rename temporal embeddings and diffs to time deltas
sadamov Dec 20, 2024
4a278fd
Adding some comments about analysis_time indexing
sadamov Dec 20, 2024
c82d22b
moved comments around
sadamov Dec 20, 2024
6e3f3bd
Make hotfix to make boundary dataset created with mdp work
joeloskarsson Dec 19, 2024
20ca263
Bugfixes
sadamov Dec 20, 2024
c0c50d5
sadamov Dec 20, 2024
94de240
Add missing check if boundary_forcing is None
sadamov Dec 20, 2024
1d14a15
bugfix typo in time check
sadamov Dec 20, 2024
7e5797e
introduce crop_time_if_needed to align interior with boundary data
sadamov Dec 20, 2024
5d94325
Merge branch 'feat/boundary_dataloader' into boundary_forcing
joeloskarsson Jan 8, 2025
b296095
Fix bug in datastore loading in graph creation script
joeloskarsson Jan 10, 2025
2f6515d
Return None from mdp datastore when no forcing is present, to not bre…
joeloskarsson Jan 13, 2025
1c75fe1
Fix time cropping for both start and end of interval
joeloskarsson Jan 13, 2025
44c8284
Do not icount forcing time delta as input dim if no forcing is used
joeloskarsson Jan 13, 2025
bba94a5
linter
sadamov Jan 13, 2025
a33b33d
fixed missing boundary_datastore arg
sadamov Jan 13, 2025
4e7bd9a
improve cpu capabilities
sadamov Jan 13, 2025
0ad05b0
Merge branch 'boundary_forcing' of https://github.com/joeloskarsson/n…
sadamov Jan 13, 2025
7316a00
format
sadamov Jan 13, 2025
6672756
bugfix indexing batch-index for time
sadamov Jan 13, 2025
4d6bbed
Do not force plot extent to be global
joeloskarsson Jan 13, 2025
ecf05e0
Fix bug making time deltas not be multiple of state time step
joeloskarsson Jan 14, 2025
27061ec
Do not add time delta features for interior forcing
joeloskarsson Jan 14, 2025
64a28c3
Compute forcing time deltas per sample to accurately represent shift …
joeloskarsson Jan 14, 2025
284a954
Implement time delta encodings
joeloskarsson Jan 14, 2025
eb31f32
implemented correct delta_times for forecasts
sadamov Jan 15, 2025
a3a548c
Add docstring for encoding function
joeloskarsson Jan 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci-pdm-install-and-test-gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ jobs:
- name: Checkout
uses: actions/checkout@v2

- name: Set up Python 3.9
- name: Set up Python 3.10
uses: actions/setup-python@v2
with:
python-version: 3.9
python-version: 3.10

- name: Install pdm
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/ci-pip-install-and-test-gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ jobs:
- name: Checkout
uses: actions/checkout@v2

- name: Set up Python 3.9
- name: Set up Python 3.10
uses: actions/setup-python@v2
with:
python-version: 3.9
python-version: 3.10

- name: Install torch (GPU CUDA 12.1)
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]
python-version: ["3.10", "3.11"]
steps:
- uses: actions/checkout@v2
- name: Set up Python
Expand Down
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,9 @@ Once `neural-lam` is installed you will be able to train/evaluate models. For th
interface that provides the data in a data-structure that can be used within
neural-lam. A datastore is used to create a `pytorch.Dataset`-derived
class that samples the data in time to create individual samples for
training, validation and testing.
training, validation and testing. A secondary datastore can be provided
for the boundary data. Currently, boundary datastore must be of type `mdp`
and only contain forcing features. This can easily be expanded in the future.

2. **The graph structure** is used to define message-passing GNN layers,
that are trained to emulate fluid flow in the atmosphere over time. The
Expand All @@ -121,7 +123,7 @@ different aspects about the training and evaluation of the model.

The path you provide to the neural-lam config (`config.yaml`) also sets the
root directory relative to which all other paths are resolved, as in the parent
directory of the config becomes the root directory. Both the datastore and
directory of the config becomes the root directory. Both the datastores and
graphs you generate are then stored in subdirectories of this root directory.
Exactly how and where a specific datastore expects its source data to be stored
and where it stores its derived data is up to the implementation of the
Expand All @@ -134,6 +136,7 @@ assume you placed `config.yaml` in a folder called `data`):
data/
β”œβ”€β”€ config.yaml - Configuration file for neural-lam
β”œβ”€β”€ danra.datastore.yaml - Configuration file for the datastore, referred to from config.yaml
β”œβ”€β”€ era5.datastore.zarr/ - Optional configuration file for the boundary datastore, referred to from config.yaml
└── graphs/ - Directory containing graphs for training
```

Expand All @@ -142,18 +145,20 @@ And the content of `config.yaml` could in this case look like:
datastore:
kind: mdp
config_path: danra.datastore.yaml
datastore_boundary:
kind: mdp
config_path: era5.datastore.yaml
training:
state_feature_weighting:
__config_class__: ManualStateFeatureWeighting
values:
weights:
u100m: 1.0
v100m: 1.0
```

For now the neural-lam config only defines two things: 1) the kind of data
store and the path to its config, and 2) the weighting of different features in
the loss function. If you don't define the state feature weighting it will default
to weighting all features equally.
For now the neural-lam config only defines two things:
1) the kind of datastores and the path to their config
2) the weighting of different features in the loss function. If you don't define the state feature weighting it will default to weighting all features equally.

(This example is taken from the `tests/datastore_examples/mdp` directory.)

Expand Down Expand Up @@ -525,5 +530,4 @@ Furthermore, all tests in the ```tests``` directory will be run upon pushing cha

# Contact
If you are interested in machine learning models for LAM, have questions about the implementation or ideas for extending it, feel free to get in touch.
There is an open [mllam slack channel](https://join.slack.com/t/ml-lam/shared_invite/zt-2t112zvm8-Vt6aBvhX7nYa6Kbj_LkCBQ) that anyone can join (after following the link you have to request to join, this is to avoid spam bots).
You can also open a github issue on this page, or (if more suitable) send an email to [[email protected]](mailto:[email protected]).
There is an open [mllam slack channel](https://join.slack.com/t/ml-lam/shared_invite/zt-2t112zvm8-Vt6aBvhX7nYa6Kbj_LkCBQ) that anyone can join. You can also open a github issue on this page, or (if more suitable) send an email to [[email protected]](mailto:[email protected]).
Loading
Loading