Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF global attributes vs data variable local attributes #3325

Closed
bjlittle opened this issue Jun 5, 2019 · 32 comments · Fixed by #5152
Closed

NetCDF global attributes vs data variable local attributes #3325

bjlittle opened this issue Jun 5, 2019 · 32 comments · Fixed by #5152

Comments

@bjlittle
Copy link
Member

bjlittle commented Jun 5, 2019

Update Mon 3rd July:
this effort is now handled in it's own project
please see there for existing task breakdown + progress


At the moment iris takes a rational but naive approach to dealing with the local attributes of a NetCDF variable (a variable that becomes a cube) and the global attributes of the NetCDF file that the said variable comes from.

That is, the resultant cube.attributes will be a combination of both the local and global attributes, where the local attributes will take precedence, and overwrite, common global attributes.

From the inception of iris, and in the light of no use cases, this seemed like a reasonable thing to do. However, such an approach prevents preservation of the local and global attributes metadata. This is a major issue for many users, who require to preserve all attribute metadata.

We require to resolve this issue now in iris once and for all 😄

Note that, if a solution to this issue was implemented, then it would most likely be a breaking change - caution is needed here.

This is somewhat tangential related to #2352

@bjlittle
Copy link
Member Author

bjlittle commented Jun 5, 2019

Ping @zklaus 👍

@zklaus
Copy link

zklaus commented Jun 5, 2019

A non-breaking way to implement that may be to introduce two new class members global_attributes and local_attributes (or var_attributes) responsible for the respective attributes with a property replacing the current attributes that can mimic the current behaviour for reading, overwrite existing entries in the respective dictionary, and otherwise defaulting to the local one.

@bjlittle
Copy link
Member Author

bjlittle commented Jun 6, 2019

@zklaus I was thinking along the same lines. In summary:

  • cube.local_attributes is a mutable dictionary for defining the attributes that are local to the associated NetCDF data variable of the cube when it's written to a NetCDF file
  • cube.global_attributes is a mutable dictionary for defining the attributes that are global to the associated NetCDF file that the data variable of the cube is written to
  • cube.attributes is a (stateless) combination of the cube.global_attributes and cube.local_attributes, akin to now, where the cube.local_attributes have priority over the cube.global_attributes

So for the case where there is a common attribute shared between cube.global_attributes and cube.local_attributes, the local value is shown in the cube.attributes. When saving such a cube to NetCDF, then the common local and global attribute is preserved (hoo-rah) i.e. the local on the NetCDF data variable of the cube, and the global in the global scope of the NetCDF file.

To ensure that there is a non-breaking behaviour here, then I think that I'm right in saying that if a user writes to the cube.attributes then this state is captured in the cube.global_attributes. Note that, cube.attributes is stateless, in the sense that it is simply derived on the fly at run-time from both cube.local_attributes and cube.global_attributes, with local having priority over global. This means that if a user wants to associate an attribute to the NetCDF data variable of the cube, then they must explicitly add the attribute to the cube.local_attributes, and not the cube.attributes.

Hmmm.... this make sense to me. Thoughts?

@bjlittle bjlittle assigned bjlittle and unassigned bjlittle Jun 6, 2019
@zklaus
Copy link

zklaus commented Jun 11, 2019

Exactly what I was thinking!

@jonseddon
Copy link
Contributor

We bumped into this ticket while looking at a related project. Can I check what would happen when you save a cubelist rather than a cube? If cubes in the cubelist had different values of the same global attribute, how would this be saved? If this saved netCDF file was loaded back into Iris would we get the same cubelist and if we didn't, would this matter?

@bjlittle
Copy link
Member Author

bjlittle commented Aug 1, 2019

@jonseddon Good question...

Clearly, if there is a global attribute conflict across Cubes in the CubeList, then it wouldn't be possible to save any such conflicted attribute to the netCDF file.

However, it begs the question whether a CubeList should also have state for overriding attributes on save. Again, careful consideration is required here to understand the overall behaviour and whether that's appropriate. In particular, consideration is required also for loading from multiple different netCDF files...

To be honest, I'd opt to separate concerns here. I'd see the debate about CubeList having attributes state as an extension to this proposal - but there should certainly be clarity for the behaviour when there is a conflict for global attributes of Cubes in a CubeList as it stands here.

@ehogan
Copy link
Contributor

ehogan commented Aug 1, 2019

I had a question about this :) Does it make sense to add global attributes and variable attributes, which I would argue are netCDF-specific concepts, to the cube, which is meant to be format-agnostic?

@zklaus
Copy link

zklaus commented Aug 1, 2019

Re cubelists: The question is certainly a good one. Note that right now there is no guarantee that loading a single cube from a netcdf file and saving it again will give you the same file. For example,

  • if there is a global attribute comment and a local attribute comment, the local attribute will essentially overwrite the global one and end up as the only comment attribute in the final file in the global section
  • a global attribute that is one of the special local attributes in iris will end up in the variable section
  • a local attribute that is not recognized as special will end up in the global section.

Seeing as it seems to me that a lot (most?) data has one variable per file (notably of course all cmip and cordex data) I am not sure I would be worried about consistency in cubelist storing so much, at least until we have better consistency in cube storing. Though it is certainly a good idea to keep this in mind so as not to make unnecessary outright contradictory decisions.

Re format agnosticism: That is certainly a nice goal, but maybe it needs better definition? It seems that attributes in and of themselves are unsupportable in, eg grib and derived formats. Surely we don't want to abandon them completely. So, is there a format that has attribute support, is supported by iris, and could not be made to work with this model?

@bjlittle
Copy link
Member Author

bjlittle commented Aug 1, 2019

@ehogan From a purely idealistic perspective, I'd agree with you. A Cube should be format agnostic. However, in reality, that's not really the case.

For me it's an intention at best, rather than a hard and fast rule. Consider the special way that we handle PP STASH and NetCDF var_name. These fileformat specifics have crept into the way that we deal with cubes and coordinates, along with other CF-isms that may only make sense for NetCDF. The reason this has happened is that there is tangible utility or benefit behind it - so it's a common sense compromise in my opinion. However, we do try hard not to dilute our attempts to be as agnostic as possible.

I don't know if this helps answer your question...

@pp-mo
Copy link
Member

pp-mo commented Aug 2, 2019

Consider the special way that we handle PP STASH and NetCDF var_name

Actually we should have something in GRIB space too, but we don't.
Just plugged a suggestion here : SciTools/iris-grib#153

@bjlittle bjlittle modified the milestones: v2.3.0, v3.1.0 Nov 13, 2019
@trexfeathers trexfeathers added the Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info label Jul 10, 2023
@trexfeathers trexfeathers moved this to 🚧 In Development in 🐉 Dragon Taming Jul 10, 2023
@trexfeathers trexfeathers modified the milestones: v3.7, v3.8 Aug 16, 2023
@stephenworsley stephenworsley moved this to 🆕 New - potential tasks in 🐙Iris v3.8.0 Sep 28, 2023
@stephenworsley stephenworsley moved this from 🆕 New - potential tasks to Candidate for next sprint in 🐙Iris v3.8.0 Oct 5, 2023
@stephenworsley stephenworsley moved this from Candidate for next sprint to 📋 Backlog in 🐙Iris v3.8.0 Oct 5, 2023
@stephenworsley stephenworsley moved this from 📋 Backlog to 👀 In review in 🐙Iris v3.8.0 Oct 12, 2023
@trexfeathers trexfeathers modified the milestone: v3.8 Nov 14, 2023
@github-project-automation github-project-automation bot moved this from ⚔ In Development to 💰 Finished in 🐉 Dragon Taming Nov 21, 2023
@github-project-automation github-project-automation bot moved this from 👀 In review to 🏁 Done in 🐙Iris v3.8.0 Nov 21, 2023
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in ESMValTool Nov 21, 2023
@github-project-automation github-project-automation bot moved this from 📋 Backlog to 🏁 Done in 🦊 Iris v3.7.0 Nov 21, 2023
@github-project-automation github-project-automation bot moved this from To Do to Done in Iris v3.2.0 Nov 21, 2023
@scitools-ci scitools-ci bot removed this from 🚴 Peloton Dec 15, 2023
@scitools-ci scitools-ci bot removed this from 🚴 Peloton Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: 💰 Finished
Status: Done
Status: 🏁 Done
Status: 🏁 Done
Development

Successfully merging a pull request may close this issue.