Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • s2dv s2dv
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 17
    • Issues 17
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth SciencesEarth Sciences
  • s2dvs2dv
  • Issues
  • #15
Closed
Open
Issue created Oct 22, 2020 by aho@ahoMaintainer

MeanDims: slower than Mean1Dim

This issue is reported by @llledo. The MeanDims function is the combination/refinement of s2dverification::Mean1Dim and MeanListDim. However, in some circumstances, MeanDims() seems to have lower efficiency than Mean1Dim().

I used rbenchmark package to do the benchmark tests. I created a random array data <- array(rnorm(1:1000), dim = c(dat = 1, time = 12, member = 50, lon = 360, lat = 180)) and did the average over 1, 2, and 3 dimensions. Here are the results.

# 1) Average over 'member'
         test replications elapsed relative user.self sys.self
2    Mean1Dim           10  64.622    1.000    61.024    3.168
1    MeanDims           10  91.476    1.416    88.897    1.912
3 MeanListDim           10  93.579    1.448    90.890    2.045

# 2) Average over 'time and member'
         test replications elapsed relative user.self sys.self
2    Mean1Dim           10  67.951    2.761    64.912    2.572
1    MeanDims           10  24.764    1.006    22.214    2.360
3 MeanListDim           10  24.608    1.000    23.121    1.308

# 3) Average over 'time and member and lon'
         test replications elapsed relative user.self sys.self
2    Mean1Dim           10  68.918    4.929    64.688    3.756
1    MeanDims           10  13.983    1.000    12.093    1.780
3 MeanListDim           10  16.335    1.168    12.725    3.457

The results show that:

  1. MeanDims() and MeanListDim() have similar performance.
  2. Mean1Dim() is 1.4x faster when only one dimension is applied, while it is 2.7x slower when two dimensions are applied and 4.9x slower when the dimension number goes to 3.
  3. MeanDims() has the best performance in the 3rd case.

Since one of the major differences between MeanDims() and the other two is that MeanDims() does input checks beforehand. So I removed those checks and run the first case (i.e., average over 1 dim) again.

2    Mean1Dim           10  65.036    1.000    60.980    3.596
1    MeanDims           10  70.094    1.078    68.348    1.304
3 MeanListDim           10  92.020    1.415    89.706    1.716

The result shows that MeanDims() becomes faster and similar to Mean1Dim().

Note that though most of the s2dv functions implement multiApply, MeanDims() does not. Therefore, we know that those checks have a considerable impact on efficiency.

@nperez do you have any idea to improve this issue?

Cheers,
An-Chi

Assignee
Assign to
Time tracking