Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass datatype to writeRaster() from tar_terra_rast() #132

Open
Tracked by #2374
brownag opened this issue Dec 23, 2024 · 2 comments
Open
Tracked by #2374

Pass datatype to writeRaster() from tar_terra_rast() #132

brownag opened this issue Dec 23, 2024 · 2 comments

Comments

@brownag
Copy link
Contributor

brownag commented Dec 23, 2024

Currently we support passing gdal and filetype arguments to writeRaster(), but there are other arguments that are important or could be useful. These include datatype, NAflag, scale, offset, to name a few.

I think it is possible to pass additional arguments in ..., which are not currently used in tar_terra_rast(), or to add the arguments that are most needed one at a time. We might just need to iterate over them and evaluate them to get constant values we can pass down through tar_resources_custom_format()

Perhaps the most important to add is datatype which matters when writing to file for geotargets use cases. datatype does not have a setter like NAflag, scale and offset, etc. do because SpatRaster objects in memory do not really have a concept of "data type" with the same granularity of the GDAL band data types unless they have just been read from file or until they are written to file

The band type selected can dramatically affect the total size of target data files, read/write/compression/processing times, precision, etc.

@hansvancalster raised the following issue in #127 (comment)_

  • the datatype was changed. In the function get_map(), I used something like m <- terra::writeRaster(..., datatype = "INT2U"); return(m) to make things work nicely with raster files extracted from an ArcGis file geodatabase. Running this outside the pipeline returns the expected datatype, however in the pipeline it is changed to "INT4U". This is probably a separate issue: allow passing datatype to terra::writeRaster() from geotargets::tar_terra_rast(). I also tried passing gdal = c("DATATYPE = INT2U") but that did not work. EDIT: I have removed the targets store and built everything again and I now get INT2U as datatype. Totally not clear to me why the problem dissapeared... EDIT2: the returned datatype seems to change unpredictably between different runs and may even differ between branches (it is always either INT4U or INT2U, whereas I would prefer it do be always INT2U for my use case).

The GDAL "DATATYPE" option does not exist, so users have no way to control the raster band type in geotargets, leading to unexpected results if they do something specific to their raster to set a more detailed data type beyond what the terra heuristics do.

The inconsistent behavior in @hansvancalster's case was due to a SpatRaster having been written to file manually within a target using datatype = "INT2U", then written again using tar_terra_rast() which converts the integer data to "INT4U". The datatype the user sees would depend on whether the object being checked was the one created by the previous target (preserving INT2U from manual write), or one loaded from the target store (preserving INT4U from target write).

For example, "elev.tif" is stored as INT2S the first SpatRaster read from that file has that data type. If we do any operation on it e.g. multiply by 1L the data type becomes "" and subsequent writes of that will return a raster with data type INT4U, since the data stored are positive integers.

library(terra)
#> terra 1.8.7

x <- rast(system.file("ex", "elev.tif", package = "terra"))
datatype(x)
#> [1] "INT2S"

x <- x*1L
datatype(x)
#> [1] ""

y <- writeRaster(x, "test.tif", overwrite = TRUE)
datatype(y)
#> [1] "INT4U"

This difference in integer representation is because terra will write different data types depending on the type of values in the raster grid--but it does not choose from a full set of options. For example:

library(terra)
#> terra 1.8.7

r <- terra::rast(matrix(65535))
r2 <- terra::writeRaster(r, "test.tif", overwrite=T)
terra::datatype(r2)
#> [1] "FLT4S"

r <- terra::rast(matrix(65533L))
r2 <- terra::writeRaster(r, "test.tif", overwrite=T)
terra::datatype(r2)
#> [1] "INT4U"

r <- terra::rast(matrix(-65533L))
r2 <- terra::writeRaster(r, "test.tif", overwrite=T)
terra::datatype(r2)
#> [1] "INT4S"

r <- terra::rast(matrix(30L))
r2 <- terra::writeRaster(r, "test.tif", overwrite=T)
terra::datatype(r2)
#> [1] "INT4U"

r <- terra::rast(matrix(30))
r2 <- terra::writeRaster(r, "test.tif", overwrite=T)
terra::datatype(r2)
#> [1] "FLT4S"

r <- terra::rast(matrix(2147483648))
r2 <- terra::writeRaster(r, "test.tif", overwrite=T)
terra::datatype(r2)
#> [1] "FLT4S"

Note that the data have to be an integer R type to get INT4U or INT4S. If there are negative values it is a signed integer, if all positive, unsigned. There is no heuristic that selects the other varieties of integer, even with all small integers less than 255 (INT1U) or 65534 (INT2U). All other numeric R types are written as FLT4S. We need the datatype argument to be able to write byte, smaller integers, and 64-bit floating point (FLT8S).

This default behavior of writeRaster() can be a problem for people who are working with heterogeneous source data that might be read as integer or numeric depending on specific factors related to each layer read. It would be helpful to be able to be explicit and guarantee the data value type when target is written and have the full selection of options available for workflows that can benefit with simpler representation of numeric values.

@brownag brownag changed the title Pass datatype to terra::writeRaster() from geotargets::tar_terra_rast() Pass datatype to writeRaster() from tar_terra_rast() Dec 23, 2024
@Aariq
Copy link
Collaborator

Aariq commented Jan 9, 2025

I wonder if we can just use the currently unused ... in tar_terra_rast() to pass arguments to the ... of terra::writeRaster()? The challenge is using ... in the substitute argument of tar_format()—I'm not sure how that works of it it works yet.

@brownag
Copy link
Contributor Author

brownag commented Jan 9, 2025

I wonder if we can just use the currently unused ... in tar_terra_rast() to pass arguments to the ... of terra::writeRaster()? The challenge is using ... in the substitute argument of tar_format()—I'm not sure how that works of it it works yet.

@Aariq I am pretty sure I have this working in #137 AFAICT so far it is pretty seamless but have not tested it extensively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants