multiApply issueshttps://earth.bsc.es/gitlab/ces/multiApply/-/issues2023-06-08T16:23:52+02:00https://earth.bsc.es/gitlab/ces/multiApply/-/issues/15Should Apply() return attributes?2023-06-08T16:23:52+02:00ahoShould Apply() return attributes?Apply() doesn't have attributes returned, even when the input data has attributes and parameter `use_attributes` is used. In the documentation, if the attributes are expected to be returned is not clear. The definition of `use_attributes...Apply() doesn't have attributes returned, even when the input data has attributes and parameter `use_attributes` is used. In the documentation, if the attributes are expected to be returned is not clear. The definition of `use_attributes` is as follow:
> #' @param use_attributes List of vectors of character strings with names of attributes of each object in 'data' to be propagated to the subsets of data sent as inputs to the function specified in 'fun'. If this parameter is not specified (NULL), all attributes are dropped. This parameter can be specified as a named list (then the names of this list must match those of the names of parameter 'data'), or as an unnamed list (then the vectors of attribute names will be assigned in order to the input arrays in 'data').
So we know that if `use_attributes = NULL`, the attributes are not taken by Apply() so it makes sense to not have attributes along with the returned array. However, with `use_attributes` defined, the attributes are still lost.
Here is a minimum example. From the print message in func(), we can see that `use_attributes` does work as described, passing attributes to Apply(). But none of the results from Apply() have attributes. If we call func() directly, the attributes are preserved. The attributes are lost **after [iteration()](https://earth.bsc.es/gitlab/ces/multiApply/-/blob/master/R/Apply.R#L536), an inner function in Apply().
```r
library(multiApply)
arr <- array(1:60, dim = c(sdate = 10, time = 2, region = 3))
attributes(arr)$units <- 'K'
time_attr <- c(paste0(1961:1970, "-11-01 12:00:00"), paste0(1961:1970, "-12-01 12:00:00"))
time_attr <- as.POSIXct(time_attr, tz = 'UTC')
dim(time_attr) <- dim(arr)[1:2]
attributes(arr)$time <- time_attr
func <- function(x) {
print(str(x))
attributes(x)$new_attr <- 'A new attribute!'
return(x)
}
# res1 and res2 are the same; with res3, the attributes are not passed to Apply()
res1 <- Apply(list(data = arr), func, target_dims = 'sdate', output_dims = 'sdate', use_attributes = list(data = c('units', 'time')))
res2 <- Apply(arr, func, target_dims = 'sdate', output_dims = 'sdate', use_attributes = list(c('units', 'time')))
res3 <- Apply(arr, func, target_dims = 'sdate', output_dims = 'sdate')
# Call func directly. Attributes are returned
res4 <- func(arr)
```
There are three types of attributes: (1) All the attributes of the input data (2) the ones in `use_attributes` (3) the ones returned by `fun`, i.e., `$new_attr` in the example above. (1) and (2) are doable since the function just needs to paste the original attributes to the returned object; however, (1) doesn't make much sense to me since not all the attributes are wanted. The (3) one sounds reasonable, but in fact it is difficult because I don't know how to combine the attributes of all the chunks together.
As I understood, @vagudets you want to have (3) to facilitate the Compute() case in SUNSET. For the "normal" Apply() usage and startR case, I would say that manually saving and attaching the attributes to the result should be enough, but I understand it is difficult to generalize the code by this means.
This is all I have now. @nperez I tag you in case you have some insight about this issue. Please let me know what you think, thanks!
Best,
An-Chi
**To be specific, in iteration(), `result` is the result of `fun` applied on `data`, which has attributes. `sub_arrays_of_results` is the final one returned by iteration(), which doesn't have attributes anymore.https://earth.bsc.es/gitlab/ces/multiApply/-/issues/14Any type of foreach adaptator2024-03-14T10:52:16+01:00Nuria Pérez-ZanónAny type of foreach adaptatorHi @aho
I have received this suggestion by email from an external user (see below). Could you consider it, please?
(I'll sent you the email too)
Cheers,
Núria
Hi,
would you consider supporting other types of foreach parallel backe...Hi @aho
I have received this suggestion by email from an external user (see below). Could you consider it, please?
(I'll sent you the email too)
Cheers,
Núria
Hi,
would you consider supporting other types of foreach parallel backends
than the currently hard-coded doParallel package, e.g.
from https://earth.bsc.es/gitlab/ces/multiApply/-/blob/master/R/Apply.R#L696-700:
```
# Execute in parallel if needed
parallel <- ncores > 1
if (parallel) registerDoParallel(ncores)
result <- llply(1:length(chunk_sizes), iteration, .parallel = parallel)
if (parallel) registerDoSEQ()
```
One way to do this, is to support whatever foreach adaptor is
currently set when ncores = NA. Please see attached patch, but the
gist is:
```
if (is.null(ncores)) {
ncores <- 1
parallel <- FALSE
} else if (is.na(ncores)) {
# Use whatever foreach adaptor is already registered
parallel <- NA
ncores <- getDoParWorkers() # number of parallel workers
} else if (is.numeric(ncores)) {
ncores <- round(ncores)
parallel <- (ncores > 1)
} else {
stop("Parameter 'ncores' must be numeric or NA.")
}
...
# Execute in parallel if needed
if (is.na(parallel)) {
parallel <- TRUE
} else if (parallel) {
registerDoParallel(ncores)
on.exit(registerDoSEQ())
}
result <- llply(1:length(chunk_sizes), iteration, .parallel = parallel)
```
This would allow users to use, for instance, any parallel backend
supported by the futureverse, e.g.
```
library(multiApply)
data <- list(array(1:4, dim = c(A = 1, B = 2, C = 2)),
array(1:6, dim = c(a = 2, b = 3)))
test_fun <- function(x, y) {
str(list(x = x, y = y))
sum(x) / sum(y)
}
message("*** Sequential")
test <- Apply(data, target_dims = list(3, 2), test_fun)
test0 <- test
message("*** Parallel")
test <- Apply(data, target_dims = list(3, 2), test_fun, ncores = 2L)
stopifnot(identical(test, test0))
message("*** doMC with two forked workers")
library(doMC)
registerDoMC(2L)
test <- Apply(data, target_dims = list(3, 2), test_fun, ncores = NA)
stopifnot(identical(test, test0))
message("*** doFuture with sequential processing")
library(doFuture)
registerDoFuture()
plan(sequential)
test <- Apply(data, target_dims = list(3, 2), test_fun, ncores = NA)
stopifnot(identical(test, test0))
message("*** doFuture with two local workers")
library(doFuture)
registerDoFuture()
plan(multisession, workers = 2L)
test <- Apply(data, target_dims = list(3, 2), test_fun, ncores = NA)
stopifnot(identical(test, test0))
message("*** doFuture with parallel workers on two other machines")
library(doFuture)
registerDoFuture()
plan(cluster, workers = c("m1.example.org", "m2.example.org"))
test <- Apply(data, target_dims = list(3, 2), test_fun, ncores = NA)
stopifnot(identical(test, test0))
```
I've verified that this works.
Henrikahoahohttps://earth.bsc.es/gitlab/ces/multiApply/-/issues/7Problem when arrays have dimnames2022-09-01T09:35:27+02:00Nuria Pérez-ZanónProblem when arrays have dimnamesAs reported by @bertvs, multiApply doesn't seem able to handle arrays with dimnames attributes.
Below is a very simple test that gives an error in multiApply:
```
mod <- seq(1, 2 * 3)
obs <- seq(1, 2 * 3)
dim(mod) <- c(dataset = 2, ...As reported by @bertvs, multiApply doesn't seem able to handle arrays with dimnames attributes.
Below is a very simple test that gives an error in multiApply:
```
mod <- seq(1, 2 * 3)
obs <- seq(1, 2 * 3)
dim(mod) <- c(dataset = 2, member = 3)
dim(obs) <- c(dataset = 2, member = 3)
dimnames(mod)[[1]] <- c("MF", "UKMO")
dimnames(obs)[[1]] <- c("MF", "UKMO")
test.fun <- function(obs, mod) {return(obs == mod)}
outp <- Apply(data = list(obs = obs, mod = mod),
target_dims = list(obs = c("member"), mod = c("member")),
fun = test.fun
)
Error in attributes(x) <- c(attributes(x), attr_bk) :
length of 'dimnames' [1] not equal to array extent
```
The problem also occurs when dimnames are used only once, for either mod or obs but the error disappears when using no dimnames.
The use of the use_attributes (e.g. use_attributes = list(mod = c("dimnames")) argument also causes problems.https://earth.bsc.es/gitlab/ces/multiApply/-/issues/3Apply() not as fast as apply() when simple functions are applied to a single ...2021-02-19T12:02:28+01:00Nicolau Manubens GilApply() not as fast as apply() when simple functions are applied to a single arrayAs reported by @ncortesi , in a system with 10 cores, Apply() using all of these cores is only as fast as apply() using a single core when a simple, fast function is applied.
The case reported is the following:
```r
library(multiApply)...As reported by @ncortesi , in a system with 10 cores, Apply() using all of these cores is only as fast as apply() using a single core when a simple, fast function is applied.
The case reported is the following:
```r
library(multiApply)
my.array <- array(rnorm(10000000), c(1000,1000,100))
f <- function(x) max(x + 5 * x * x)
system.time({apply(my.array, c(1, 2), f)})
#~8 seconds
system.time({Apply(my.array, 3, f)})
#~40 seconds
system.time({Apply(my.array, 3, f, ncores = 10)})
#~9 seconds
```
The apply() code has been tested in a similar system with only 1 core and the wall-clock time has been also approx. 8 seconds (i.e. apply() is not using implicit multi-core).
This could be improved possibly by making use of apply() inside Apply() in the cases where only one input array is provided.
In cases where the function to be applied takes longer, Apply() can still be useful and improve the wall-clock time by using multi-core.
In conclusion, apply() should be recommended over Apply() for cases where functions are to be applied over large margins of a single data array. If the function to be applied is complex/slow, using Apply() with multiple cores can lead to a reduced wall-clock time (at the expense of greater computing resource usage) compared to the apply() implementation.https://earth.bsc.es/gitlab/ces/multiApply/-/issues/2Renaming Apply()'s arguments2018-11-20T18:09:25+01:00Nicolau Manubens GilRenaming Apply()'s argumentsAs suggested by @ahunter and @ncortesi , Apply()'s arguments could be renamed to match those of base apply().
data -> X
fun -> FUN
margins -> MARGINS
To be reconsidered for version v3.0.0.As suggested by @ahunter and @ncortesi , Apply()'s arguments could be renamed to match those of base apply().
data -> X
fun -> FUN
margins -> MARGINS
To be reconsidered for version v3.0.0.