diff --git a/DESCRIPTION b/DESCRIPTION index d514a064fa31a9d388854a4f08d88379b355c95c..31dcd942f3bb2a29681724c40d917ebc33b7c09a 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -6,7 +6,7 @@ Authors@R: c( person("Nicolau", "Manubens", , "nicolau.manubens@bsc.es", role = "aut"), person("Alasdair", "Hunter", , "alasdair.hunter@bsc.es", role = "aut"), person("Nuria", "Perez", , "nuria.perez@bsc.es", role = "cre")) -Description: The base apply function and its variants, as well as the related functions in the 'plyr' package, typically apply user-defined functions to a single argument (or a list of vectorized arguments in the case of mapply). The 'multiApply' package extends this paradigm to efficiently apply functions taking one or a list of multiple unidimensional or multidimensional arguments (or combinations thereof) as input, which can have different numbers of dimensions as well as different dimension lengths, and returning one or a list of unidimensional or multidimensional arrays as output. This saves development time by preventing the R user from writing error-prone and memory-unefficient loops dealing with multiple complex arrays. In contrast to apply and variants, this package suggests the use of 'target dimensions' as opposite to the 'margins' for specifying the dimensions relevant to the function to be applied. Also, two remarkable features of multiApply are the support for functions returning multiple array outputs and the transparent use of multi-core. +Description: The base apply function and its variants, as well as the related functions in the 'plyr' package, typically apply user-defined functions to a single argument (or a list of vectorized arguments in the case of mapply). The 'multiApply' package extends this paradigm with its only function, Apply, which efficiently applies functions taking one or a list of multiple unidimensional or multidimensional numeric arrays (or combinations thereof) as input. The input arrays can have different numbers of dimensions as well as different dimension lengths, and the applied function can return one or a list of unidimensional or multidimensional arrays as output. This saves development time by preventing the R user from writing often error-prone and memory-unefficient loops dealing with multiple complex arrays. Also, a remarkable feature of Apply is the transparent use of multi-core through its parameter 'ncores'. In contrast to the base apply function, this package suggests the use of 'target dimensions' as opposite to the 'margins' for specifying the dimensions relevant to the function to be applied. Depends: R (>= 3.2.0) Imports: diff --git a/R/Apply.R b/R/Apply.R index 335d437dc39d6b472ee9fe128ebb4b3807d50849..876c8f44044be9e08af37690558f0ab878ac3caf 100644 --- a/R/Apply.R +++ b/R/Apply.R @@ -101,14 +101,17 @@ Apply <- function(data, target_dims = NULL, fun, ..., "found across different inputs in 'data'. Please check ", "carefully the assumed names below are correct, or provide ", "dimension names for safety, or disable the parameter ", - "'guess_dimension_names'.", dim_names_string) + "'guess_dim_names'.", dim_names_string) } # Check fun if (is.character(fun)) { - try({fun <- get(fun)}, silent = TRUE) + fun_name <- fun + err <- try({ + fun <- get(fun) + }, silent = TRUE) if (!is.function(fun)) { - stop("Could not find the function '", fun, "'.") + stop("Could not find the function '", fun_name, "'.") } } if (!is.function(fun)) { diff --git a/README.md b/README.md index df71716eaf3f4867dd0beeec8d3ff30bb4ad9ae5..99d3eab088df8157893d0ebba69966af6c8269aa 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ ## multiApply [![build status](https://earth.bsc.es/gitlab/ces/multiApply/badges/master/build.svg)](https://earth.bsc.es/gitlab/ces/multiApply/commits/master) [![CRAN version](http://www.r-pkg.org/badges/version/multiApply)](https://cran.r-project.org/package=multiApply) [![coverage report](https://earth.bsc.es/gitlab/ces/multiApply/badges/master/coverage.svg)](https://earth.bsc.es/gitlab/ces/multiApply/commits/master) [![License: LGPL v3](https://img.shields.io/badge/License-LGPL%20v3-blue.svg)](https://www.gnu.org/licenses/lgpl-3.0) [![CRAN RStudio Downloads](https://cranlogs.r-pkg.org/badges/multiApply)](https://cran.rstudio.com/web/packages/multiApply/index.html) -This package includes the function `Apply` as its only function. It extends the `apply` function to applications in which a function needs to be applied simultaneously over multiple input arrays. Although this can be done manually with for loops and calls to the base apply function, in many cases it can be a challenging task which can very easily result in error-prone or memory-unefficient code. +This package includes the function `Apply` as its only function. It extends the `apply` function to applications in which a function needs to be applied simultaneously over multiple input arrays. Although this can be done manually with for loops and calls to the base `apply` function, it can often be a challenging task which can easily result in error-prone or memory-unefficient code. -A very simple example follows showing the kind of situation where `Apply` can be useful: imagine you have two arrays, each containing five 2x2 matrices, and you want to perform the multiplication of each of the five pairs of matrices. Next, one of the best ways to do this with base R: +A very simple example follows showing the kind of situation where `Apply` can be useful: imagine you have two arrays, each containing five 2x2 matrices, and you want to perform the multiplication of each of the five pairs of matrices. Next, one of the best ways to do this with base R (plus some helper libraries): ```r library(plyr) @@ -16,7 +16,7 @@ D <- aaply(X = abind(A, B, along = 4), FUN = function(x) x[,,1] %*% x[,,2]) ``` -Although it is not excessively complex, the choosen example is very simple and the complexity would increase as the function to apply required additional dimensions or inputs, and would be unapplicable if multiple outputs were to be returned. In addition, the function to apply (matrix multiplication) had to be redefined for this particular case (multiplication of the first matrix by the second). +Since the choosen use case is very simple, this solution is not excessively complex, but the complexity would increase as the function to apply required additional dimensions or inputs, and would be unapplicable if multiple outputs were to be returned. In addition, the function to apply (matrix multiplication) had to be redefined for this particular case (multiplication of the first matrix along the third dimension by the second along the third dimension). Next, an example of how to reach the same results using `Apply`: @@ -31,7 +31,16 @@ D <- Apply(data = list(A, B), fun = "%*%")$output1 ``` -This solution takes half the time to complete, and is cleaner and extensible to functions receiving any number of inputs with any number of dimensions, or returning any number of outputs. Although the peak RAM usage (as measured with `peakRAM`) of both solutions in this example is about the same, it is challenging to avoid memory duplications when using custom code in more complex applications, and can usually require hours of dedication. `Apply` scales well to large inputs and has been designed to be fast and avoid memory duplications. +This solution takes half the time to complete (as measured with `microbenchmark` with inputs of different sizes), and is cleaner and extensible to functions receiving any number of inputs with any number of dimensions, or returning any number of outputs. Although the peak RAM usage (as measured with `peakRAM`) of both solutions in this example is about the same, it is challenging to avoid memory duplications when using custom code in more complex applications, and can usually require hours of dedication. `Apply` scales well to large inputs and has been designed to be fast and avoid memory duplications. + +Additionally, multi-code computation can be enabled via the `ncores` parameter, as shown next. Although in this minimalist example using multi-core would make the execution slower, in applications where the inputs are larger the wall-clock time is reduced dramatically. + +```r +D <- Apply(data = list(A, B), + target_dims = c(2, 3), + fun = "%*%", + ncores = 4)$output1 +``` In contrast to `apply` and variants, this package suggests the use of 'target dimensions' as opposite to the 'margins' for specifying the dimensions relevant to the function to be applied. Additionally, it supports functions returning multiple vector or arrays, and can transparently uses multi-core. @@ -44,7 +53,7 @@ install.packages('multiApply') library(multiApply) ``` -Also, you can install the latest stable version from this GitHub repository as follows: +Also, you can install the latest stable version from the GitHub repository as follows: ```r devtools::install_git('https://earth.bsc.es/gitlab/ces/multiApply') diff --git a/multiApply-manual.pdf b/multiApply-manual.pdf index a1f26396885bcaa03cdf279c9f38176bab2d08b2..c1ab41cb14670fb7d596548390f3070814cc0963 100644 Binary files a/multiApply-manual.pdf and b/multiApply-manual.pdf differ diff --git a/tests/testthat/test-use-cases.R b/tests/testthat/test-use-cases.R index e686250ed245857f5ab3070df23c0c0e2ab98d94..22aa8cad62affb72f0622a9a54c5db5fa933229a 100644 --- a/tests/testthat/test-use-cases.R +++ b/tests/testthat/test-use-cases.R @@ -1197,6 +1197,74 @@ test_that("in1: 2 dim; in2: 1 dim; targ. dims: 0-2, 0-1; out1: 1 dim; out2: 1 va ## not shared target dims #}) +# Real cases +test_that("real use case - standardization", { + standardization <- function(x, mean, deviation){ + (x - mean) / deviation + } + + x <- array(1:(2*3*4), dim = c(mod = 2, lon = 3, lat = 4)) + y <- array(1:12, dim = c(lon = 3, lat = 4)) + z <- array(1:12, dim = c(lon = 3, lat = 4)) + + expected_result <- array(c(0:11 / z, rep(1, 3 * 4)), dim = c(3, 4, mod = 2)) + + expect_equal( + Apply(data = list(x,y,z), + target_dims = list(c('lon', 'lat'), + c('lon', 'lat'), + c('lon', 'lat')), + fun = standardization)$output1, + expected_result + ) + + names(dim(expected_result)) <- c('lon', 'lat', 'mod') + + expect_equal( + Apply(data = list(x,y,z), + margins = list('mod', NULL, NULL), + fun = standardization, + output_dims = c('lon', 'lat') + )$output1, + expected_result + ) + + expect_equal( + Apply(data = list(x,y,z), + margins = list(c('mod', 'lat'), 'lat', 'lat'), + fun = standardization, + output_dims = c('lon') + )$output1, + multiApply:::.aperm2(expected_result, c(1, 3, 2)) + ) + + x <- multiApply:::.aperm2(x, c(3, 2, 1)) + + expect_equal( + Apply(data = list(x,y,z), + target_dims = list(c('lon', 'lat'), + c('lon', 'lat'), + c('lon', 'lat')), + fun = standardization, + output_dims = c('lon', 'lat') + )$output1, + expected_result + ) + +}) + +# Test .aperm2 +test_that(".aperm2", { + data <- seq(as.POSIXct('1990-11-01'), + length.out = 6, + by = as.difftime(1, units = 'days')) + dim(data) <- c(3, 2) + expect_equal( + class(multiApply:::.aperm2(data, c(2, 1))), + c('POSIXct', 'POSIXt') + ) +}) + # TODOS: # TESTS FOR MARGINS # TESTS FOR DISORDERED TARGET_DIMS