Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressing model log files #527

Open
blimlim opened this issue Oct 11, 2024 · 1 comment
Open

Compressing model log files #527

blimlim opened this issue Oct 11, 2024 · 1 comment
Assignees
Labels

Comments

@blimlim
Copy link
Contributor

blimlim commented Oct 11, 2024

Model log files can take up a significant amount of storage. For a single year of ESM1.5 simulation, the sub-models produce:

  • Atmosphere: ~50M
  • Ocean: ~60M
  • Ice: ~900M
    which can add up over long simulations.

Previous discussions raised the idea of compressing the log files during the archive step. @aidanheerdegen mentioned a good place to start would be the CICE driver, and if successful it could be expanded to a general (perhaps optional) step across models.

A couple of implementation questions/concerns:

  • Would it be preferable to compress each log file separately, or combine all the logs for a single sub-model into a tar.gz?
  • People may have workflows set up for extracting information from the logs (e.g. subroutine timing information) which would be affected.
@aidanheerdegen
Copy link
Collaborator

It is possible to grep into a compressed tar archive file:

$ ls
cice_in.nml    ice_diag.d    iceout085  iceout087  iceout089  iceout091  iceout093  iceout095  input_ice.nml                                      
debug.root.03  ice_diag_out  iceout086  iceout088  iceout090  iceout092  iceout094  iceout096
$ grep 'Timer  15:  from_ocn' *
ice_diag.d:Timer  15:  from_ocn       7.83 seconds
$ tar -zcvf logs.tar.gz *
cice_in.nml
debug.root.03
ice_diag.d
ice_diag_out
iceout085
iceout086
iceout087
iceout088
iceout089
iceout090
iceout091
iceout092
iceout093
iceout094
iceout095
iceout096
input_ice.nml
$ zgrep -a 'Timer  15:  from_ocn' logs.tar.gz                                                                           
Timer  15:  from_ocn       7.83 seconds

So I think a compressed tar archive makes most sense.

The question of what to archive? Probably best to have a list in the model driver of patterns to match files. Simple case would be just anything that beginswith a list of strings.

For cice5 it would be

log_files = ['iceout', 'ice_diag', 'debug.root']

Which isn't too bad.

For mom5 (in ESM1.5)

log_files = ['fort', 'logfile', 'debug.root', 'oceout']

For atmosphere (in ESM1.5)

log_files = ['atm.fort', 'fort', 'nout', 'debug.root']

Are the various hist files also log files? (xhist, thist, ihist)

@blimlim blimlim self-assigned this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants