multiApply issueshttps://earth.bsc.es/gitlab/ces/multiApply/-/issues2021-02-19T10:14:18+01:00https://earth.bsc.es/gitlab/ces/multiApply/-/issues/7Problem when arrays have dimnames2021-02-19T10:14:18+01:00Nuria Pérez-ZanónProblem when arrays have dimnamesAs reported by @bertvs, multiApply doesn't seem able to handle arrays with dimnames attributes.
Below is a very simple test that gives an error in multiApply:
```
mod <- seq(1, 2 * 3)
obs <- seq(1, 2 * 3)
dim(mod) <- c(dataset = 2, member = 3)
dim(obs) <- c(dataset = 2, member = 3)
dimnames(mod)[[1]] <- c("MF", "UKMO")
dimnames(obs)[[1]] <- c("MF", "UKMO")
test.fun <- function(obs, mod) {return(obs == mod)}
outp <- Apply(data = list(obs = obs, mod = mod),
target_dims = list(obs = c("member"), mod = c("member")),
fun = test.fun
)
Error in attributes(x) <- c(attributes(x), attr_bk) :
length of 'dimnames' [1] not equal to array extent
```
The problem also occurs when dimnames are used only once, for either mod or obs but the error disappears when using no dimnames.
The use of the use_attributes (e.g. use_attributes = list(mod = c("dimnames")) argument also causes problems.As reported by @bertvs, multiApply doesn't seem able to handle arrays with dimnames attributes.
Below is a very simple test that gives an error in multiApply:
```
mod <- seq(1, 2 * 3)
obs <- seq(1, 2 * 3)
dim(mod) <- c(dataset = 2, member = 3)
dim(obs) <- c(dataset = 2, member = 3)
dimnames(mod)[[1]] <- c("MF", "UKMO")
dimnames(obs)[[1]] <- c("MF", "UKMO")
test.fun <- function(obs, mod) {return(obs == mod)}
outp <- Apply(data = list(obs = obs, mod = mod),
target_dims = list(obs = c("member"), mod = c("member")),
fun = test.fun
)
Error in attributes(x) <- c(attributes(x), attr_bk) :
length of 'dimnames' [1] not equal to array extent
```
The problem also occurs when dimnames are used only once, for either mod or obs but the error disappears when using no dimnames.
The use of the use_attributes (e.g. use_attributes = list(mod = c("dimnames")) argument also causes problems.https://earth.bsc.es/gitlab/ces/multiApply/-/issues/3Apply() not as fast as apply() when simple functions are applied to a single ...2021-02-19T12:02:28+01:00Nicolau Manubens GilApply() not as fast as apply() when simple functions are applied to a single arrayAs reported by @ncortesi , in a system with 10 cores, Apply() using all of these cores is only as fast as apply() using a single core when a simple, fast function is applied.
The case reported is the following:
```r
library(multiApply)
my.array <- array(rnorm(10000000), c(1000,1000,100))
f <- function(x) max(x + 5 * x * x)
system.time({apply(my.array, c(1, 2), f)})
#~8 seconds
system.time({Apply(my.array, 3, f)})
#~40 seconds
system.time({Apply(my.array, 3, f, ncores = 10)})
#~9 seconds
```
The apply() code has been tested in a similar system with only 1 core and the wall-clock time has been also approx. 8 seconds (i.e. apply() is not using implicit multi-core).
This could be improved possibly by making use of apply() inside Apply() in the cases where only one input array is provided.
In cases where the function to be applied takes longer, Apply() can still be useful and improve the wall-clock time by using multi-core.
In conclusion, apply() should be recommended over Apply() for cases where functions are to be applied over large margins of a single data array. If the function to be applied is complex/slow, using Apply() with multiple cores can lead to a reduced wall-clock time (at the expense of greater computing resource usage) compared to the apply() implementation.As reported by @ncortesi , in a system with 10 cores, Apply() using all of these cores is only as fast as apply() using a single core when a simple, fast function is applied.
The case reported is the following:
```r
library(multiApply)
my.array <- array(rnorm(10000000), c(1000,1000,100))
f <- function(x) max(x + 5 * x * x)
system.time({apply(my.array, c(1, 2), f)})
#~8 seconds
system.time({Apply(my.array, 3, f)})
#~40 seconds
system.time({Apply(my.array, 3, f, ncores = 10)})
#~9 seconds
```
The apply() code has been tested in a similar system with only 1 core and the wall-clock time has been also approx. 8 seconds (i.e. apply() is not using implicit multi-core).
This could be improved possibly by making use of apply() inside Apply() in the cases where only one input array is provided.
In cases where the function to be applied takes longer, Apply() can still be useful and improve the wall-clock time by using multi-core.
In conclusion, apply() should be recommended over Apply() for cases where functions are to be applied over large margins of a single data array. If the function to be applied is complex/slow, using Apply() with multiple cores can lead to a reduced wall-clock time (at the expense of greater computing resource usage) compared to the apply() implementation.https://earth.bsc.es/gitlab/ces/multiApply/-/issues/2Renaming Apply()'s arguments2018-11-20T18:09:25+01:00Nicolau Manubens GilRenaming Apply()'s argumentsAs suggested by @ahunter and @ncortesi , Apply()'s arguments could be renamed to match those of base apply().
data -> X
fun -> FUN
margins -> MARGINS
To be reconsidered for version v3.0.0.As suggested by @ahunter and @ncortesi , Apply()'s arguments could be renamed to match those of base apply().
data -> X
fun -> FUN
margins -> MARGINS
To be reconsidered for version v3.0.0.