-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZStd with Netcdf #2444
base: develop
Are you sure you want to change the base?
ZStd with Netcdf #2444
Conversation
I've generated new baselines with the test_changes.list and they generated OK and passed using those in comparison (-c and -m). @junwang-noaa I'm trying to recall, were baselines changes acceptable, or do I need to make tests/tests file changes to ensure no baselines are changing in the end? |
I don't expect those tests to change results. Would you please check which files are changed in cpld_control_p8 intel? It seems these tests have gocart, but the zstd Netcdf should not impact those tests. |
|
Thanks, Brain. @lipan-NOAA @bbakernoaa @weiyuan-jiang @tclune may I ask if you have any idea why the GOCART results are changed when the netcdf library is built with zstandard? |
@junwang-noaa I honestly have no idea why as zstandard should be lossless. I'm currently doing a test build of GEOS with preliminary zstandard support. Once I get that working, I'll see what I can see with a run of GEOS+GOCART. (I'll go with level 5 as I think that is what you are using...) NOTE: There is no zstandard support in MAPL yet, so whatever you are producing is either compressed offline or not compressed at all with zstandard, I suppose. |
Update. I was able to get zstandard support into MAPL. And I do not see any difference in the output. I had history output a 3d GOCART collection as uncompressed (
Now, comparing with
Now, technically if you turn on comparison of global metadata there is one difference:
but that's just the |
@mathomp4 Thanks for the testing. In our test case, we just write out the fields without doing any compression: ideflate= 0 Also I do see the aerosol fields, e.g. dms, are different:
Any clue? |
I don't know your system but does this:
mean you are not quantizing? In MAPL, I don't think we allow for a mode to be set without an nsd also set. But beyond that, I can't see how MAPL would care about zstandard in your case since I only wrote in zstandard support today! Are you reading any zstandard compressed files? Even then, ExtData shouldn't care since at that point we depend on netCDF to read in files correctly. And I looked and MAPL hasn't done anything in recent times that should change answers. The last non-zero-diff change was a bug fix bit-shaved binary output which was a weird odd case someone reported (we don't do binary output much). |
Well, one note is that you are running with Intel 19 it looks like. We haven't used that in years (5, 6 years?) so it's possible MAPL is interacting badly with it? We would never test it. (Heck our latest machines don't have anything newer than Intel 2022...and even that was a "we'll install for you this time") I know @AlexanderRichert-NOAA also was having issues with Intel 19 and MAPL, but that was at unit test time and it was a test about reading in CS data. Are you ingesting cubed-sphere input data? |
…STANDARD_LEVEL already
Yes, we do ingest cubed sphere grid. Since there is only last digit change in GOCART fields and we won't be able to move to higher version Intel compiler with this PR, and we have a list of tests with results change, I think we can move on with this PR. |
@jkbk2004 all RDHPCS spack installs have netcdf with zstd, correct? |
@RatkoVasic-NOAA @ulmononian Please, make sure the netcdf with zstd is available in the current version of the spack stack on RDHPCS machines. |
I'm getting this error:
on these tests:
line 424 of module_write_netcdf is: |
@BrianCurtis-NOAA which machine? |
WCOSS2 |
@BrianCurtis-NOAA where is your run directory? |
/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224 |
Thanks, Brain. Have you run these tests before? I saw your comments: "I've generated new baselines with the test_changes.list and they generated OK and passed using those in comparison (-c and -m)" @DusanJovic-NOAA would you please take a look? Is it OK to set quantize_nsd 0? I see in control_wrtGauss_netcdf_parallel_intel, we have:
|
I checked on 5 machines, it looks OK:
|
@junwang-noaa The first run was for seeing what was goign to change before adjusting IDEFLATE to ZSTANDARD_LEVEL, These next failed tests are issues arising after switching IDEFLATE to ZSTANDARD_LEVEL. |
I see,thanks. |
@BrianCurtis-NOAA yes, spack-stack installs netcdf with zstd support (including on acorn). |
Setting |
NetCDF: Filter error: undefined filter encountered This error means that the netcdf library does not support zstd filter. |
@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL. |
Where is your run directory? |
/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224 |
This module:
does not set |
@BrianCurtis-NOAA I double check the old installation on acorn and cactus. ===========acorn==(Dusan test last year)==================== ===========cactus====================== ====================================== |
Curl has nothing to with zstd. You need to export HDF5_PLUGIN_PATH to point to location of filter dynamic libraries. |
Our libraries should not depend or use curl at all, but that's a separate issue. |
@DusanJovic-NOAA and @BrianCurtis-NOAA I have added "HDF5_PLUGIN_PATH" into the hdf5 modulefile: hang.lei@clogin06:/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.hang.lei@clogin06:/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12> module show hdf5/1.14.0 /lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12/hdf5/1.14.0.lua: help([[]]) |
OK, I'll give it a whirl. |
All tests ran to completion. But not all tests are getting zstandard_level into model_configure so I imagine there's still some work for the different model_configure files to make sure they get added where needed. |
After modifying the model_configures to add zstandard_level and running the full suite, all tests were able to run to completion with the following list of tests failing in comparison:
all the "UNABLE TO START TEST" were explainable with their parent tests failing comparison. Onto running the full suite on Hera to get the official test_changes.list |
@Hang-Lei-NOAA Are these in official locations already? If not let me know and i'll re-run a full suite to confirm we still see the ancitipated changes. |
@BrianCurtis-NOAA they are not, and still there as before. We have not heard of a confirmation from ufs team for if the difference are acceptable. |
OK Thanks! My re-test for WCOSS2 is almost finished. |
@Hang-Lei-NOAA go ahead and tell NCO to install officially. Please let me know if the official location is any different than normal. |
Hold off please if you can. Running a couple more things may have some issue. |
Okay, waiting for further instructions.
…On Wed, Oct 16, 2024 at 5:00 PM Brian Curtis ***@***.***> wrote:
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> go ahead and tell NCO
to install officially. Please let me know if the official location is any
different than normal.
Hold off please if you can. Running a couple more things may have some
issue.
—
Reply to this email directly, view it on GitHub
<#2444 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFFOOL6PWU6HB6CCFWLZ33HWLAVCNFSM6AAAAABOWYHPDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXHE2TGOBXGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Compared atmf021.tile1.nc manually:
Do we expect those differences to be of concern? |
@Hang-Lei-NOAA unfortunately, netcdf is still causing issues:
|
check out /lfs/h2/emc/nems/noscrub/brian.curtis/git/BrianCurtis-NOAA/ufs-weather-model/zstd_netcdf/tests/run_dir/hafs_regional_atm_intel |
I'm going to try the solution mentioned here: #2015 (comment) |
I am testing with your code, and will get back to you. @brian Curtis
…On Wed, Jan 15, 2025 at 10:49 AM Brian Curtis ***@***.***> wrote:
I'm going to try the solution mentioned here: #2015 (comment)
<#2015 (comment)>
—
Reply to this email directly, view it on GitHub
<#2444 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFHX5YJPET6CZGVYHOD2KZ7PZAVCNFSM6AAAAABOWYHPDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJTGI4DSMJRGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I checked the runs. It looks like that the model still uses wrong
netcdf/4.7.4 and wrong nccmp/1.8.9.0 which we did not require.
vi
/lfs/h2/emc/nems/noscrub/brian.curtis/git/BrianCurtis-NOAA/ufs-weather-model/zstd_netcdf/tests/logs/log_wcoss2/run_hafs_regional_atm_intel.log
==================
libfabric/1.20.1:1\;craype-network-ofi:1\;envvar/1.0:1\;ecflow/5.6.0.13:1
\;intel/19.1.3.304:1\;hdf5/1.10.6:1\;netcdf/4.7.4:1\;nccmp/1.8.9.0:1;
On Wed, Jan 15, 2025 at 10:51 AM Hang Lei - NOAA Affiliate <
***@***.***> wrote:
… I am testing with your code, and will get back to you. @brian Curtis
On Wed, Jan 15, 2025 at 10:49 AM Brian Curtis ***@***.***>
wrote:
> I'm going to try the solution mentioned here: #2015 (comment)
> <#2015 (comment)>
>
> —
> Reply to this email directly, view it on GitHub
> <#2444 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AKWSMFHX5YJPET6CZGVYHOD2KZ7PZAVCNFSM6AAAAABOWYHPDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJTGI4DSMJRGA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
@Hang-Lei-NOAA nothing in my build would hint towards an obvious answer to this. Would something in the hpc-stack be causing this? |
found netcdf/4.7.4 in run_test.sh, fixed and will run again. |
…ion before final testing
Commit Queue Requirements:
Description:
This PR brings in the zstd library and with it the netcdf support for it.
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Testing Log: