You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we support passing gdal and filetype arguments to writeRaster(), but there are other arguments that are important or could be useful. These include datatype, NAflag, scale, offset, to name a few.
I think it is possible to pass additional arguments in ..., which are not currently used in tar_terra_rast(), or to add the arguments that are most needed one at a time. We might just need to iterate over them and evaluate them to get constant values we can pass down through tar_resources_custom_format()
Perhaps the most important to add is datatype which matters when writing to file for geotargets use cases. datatype does not have a setter like NAflag, scale and offset, etc. do because SpatRaster objects in memory do not really have a concept of "data type" with the same granularity of the GDAL band data types unless they have just been read from file or until they are written to file
The band type selected can dramatically affect the total size of target data files, read/write/compression/processing times, precision, etc.
the datatype was changed. In the function get_map(), I used something like m <- terra::writeRaster(..., datatype = "INT2U"); return(m) to make things work nicely with raster files extracted from an ArcGis file geodatabase. Running this outside the pipeline returns the expected datatype, however in the pipeline it is changed to "INT4U". This is probably a separate issue: allow passing datatype to terra::writeRaster() from geotargets::tar_terra_rast(). I also tried passing gdal = c("DATATYPE = INT2U") but that did not work. EDIT: I have removed the targets store and built everything again and I now get INT2U as datatype. Totally not clear to me why the problem dissapeared... EDIT2: the returned datatype seems to change unpredictably between different runs and may even differ between branches (it is always either INT4U or INT2U, whereas I would prefer it do be always INT2U for my use case).
The GDAL "DATATYPE" option does not exist, so users have no way to control the raster band type in geotargets, leading to unexpected results if they do something specific to their raster to set a more detailed data type beyond what the terra heuristics do.
The inconsistent behavior in @hansvancalster's case was due to a SpatRaster having been written to file manually within a target using datatype = "INT2U", then written again using tar_terra_rast() which converts the integer data to "INT4U". The datatype the user sees would depend on whether the object being checked was the one created by the previous target (preserving INT2U from manual write), or one loaded from the target store (preserving INT4U from target write).
For example, "elev.tif" is stored as INT2S the first SpatRaster read from that file has that data type. If we do any operation on it e.g. multiply by 1L the data type becomes "" and subsequent writes of that will return a raster with data type INT4U, since the data stored are positive integers.
This difference in integer representation is because terra will write different data types depending on the type of values in the raster grid--but it does not choose from a full set of options. For example:
Note that the data have to be an integer R type to get INT4U or INT4S. If there are negative values it is a signed integer, if all positive, unsigned. There is no heuristic that selects the other varieties of integer, even with all small integers less than 255 (INT1U) or 65534 (INT2U). All other numeric R types are written as FLT4S. We need the datatype argument to be able to write byte, smaller integers, and 64-bit floating point (FLT8S).
This default behavior of writeRaster() can be a problem for people who are working with heterogeneous source data that might be read as integer or numeric depending on specific factors related to each layer read. It would be helpful to be able to be explicit and guarantee the data value type when target is written and have the full selection of options available for workflows that can benefit with simpler representation of numeric values.
The text was updated successfully, but these errors were encountered:
brownag
changed the title
Pass datatype to terra::writeRaster() from geotargets::tar_terra_rast()
Pass datatype to writeRaster() from tar_terra_rast()Dec 23, 2024
I wonder if we can just use the currently unused ... in tar_terra_rast() to pass arguments to the ... of terra::writeRaster()? The challenge is using ... in the substitute argument of tar_format()—I'm not sure how that works of it it works yet.
I wonder if we can just use the currently unused ... in tar_terra_rast() to pass arguments to the ... of terra::writeRaster()? The challenge is using ... in the substitute argument of tar_format()—I'm not sure how that works of it it works yet.
@Aariq I am pretty sure I have this working in #137 AFAICT so far it is pretty seamless but have not tested it extensively.
Currently we support passing
gdal
andfiletype
arguments towriteRaster()
, but there are other arguments that are important or could be useful. These include datatype, NAflag, scale, offset, to name a few.I think it is possible to pass additional arguments in
...
, which are not currently used intar_terra_rast()
, or to add the arguments that are most needed one at a time. We might just need to iterate over them and evaluate them to get constant values we can pass down throughtar_resources_custom_format()
Perhaps the most important to add is
datatype
which matters when writing to file for geotargets use cases.datatype
does not have a setter like NAflag, scale and offset, etc. do because SpatRaster objects in memory do not really have a concept of "data type" with the same granularity of the GDAL band data types unless they have just been read from file or until they are written to fileThe band type selected can dramatically affect the total size of target data files, read/write/compression/processing times, precision, etc.
@hansvancalster raised the following issue in #127 (comment)_
The GDAL "DATATYPE" option does not exist, so users have no way to control the raster band type in geotargets, leading to unexpected results if they do something specific to their raster to set a more detailed data type beyond what the terra heuristics do.
The inconsistent behavior in @hansvancalster's case was due to a SpatRaster having been written to file manually within a target using
datatype = "INT2U"
, then written again usingtar_terra_rast()
which converts the integer data to"INT4U"
. The datatype the user sees would depend on whether the object being checked was the one created by the previous target (preserving INT2U from manual write), or one loaded from the target store (preserving INT4U from target write).For example, "elev.tif" is stored as INT2S the first SpatRaster read from that file has that data type. If we do any operation on it e.g. multiply by
1L
the data type becomes""
and subsequent writes of that will return a raster with data type INT4U, since the data stored are positive integers.This difference in integer representation is because terra will write different data types depending on the type of values in the raster grid--but it does not choose from a full set of options. For example:
Note that the data have to be an integer R type to get INT4U or INT4S. If there are negative values it is a signed integer, if all positive, unsigned. There is no heuristic that selects the other varieties of integer, even with all small integers less than 255 (INT1U) or 65534 (INT2U). All other numeric R types are written as FLT4S. We need the
datatype
argument to be able to write byte, smaller integers, and 64-bit floating point (FLT8S).This default behavior of
writeRaster()
can be a problem for people who are working with heterogeneous source data that might be read as integer or numeric depending on specific factors related to each layer read. It would be helpful to be able to be explicit and guarantee the data value type when target is written and have the full selection of options available for workflows that can benefit with simpler representation of numeric values.The text was updated successfully, but these errors were encountered: