Newer
Older
This document intends to be the first reference for any doubts that you may have regarding startR. If you do not find the information you need, please open an issue for your problem.
1. [Choose the number of chunks/jobs/cores in Compute()](#1-choose-the-number-of-chunksjobscores-in-compute)
2. [Merge/Reorder dimension in Start() (using parameter 'xxx_across' and 'merge_across_dims')](#2-mergereorder-dimension-in-start-using-parameter-xxx_across-and-merge_across_dims)
3. [Use self-defined function in Compute()](#3-use-self-defined-function-in-compute)
4. [Use package function in Compute()](#4-use-package-function-in-compute)
5. [Do interpolation in Start() (using parameter 'transform')](#5-do-interpolation-in-start-using-parameter-transform)
6. [Get data attributes without retrieving data to workstation](#6-get-data-attributes-without-retrieving-data-to-workstation)
7. [Avoid or specify a node from cluster in Compute()](#7-avoid-or-specify-a-node-from-cluster-in-compute)
8. [Define a path with multiple dependencies](#8-define-a-path-with-multiple-dependencies)
1. [No space left on device](#1-no-space-left-on-device)
2. [ecFlow UI remains blue and does not update status](#2-ecflow-ui-remains-blue-and-does-not-update-status)
3. [Compute() successfully but then killed on R session](#3-compute-successfully-but-then-killed-on-r-session)
4. [My jobs work well in workstation and fatnodes but not on Power9 (or vice versa)](#4-my-jobs-work-well-in-workstation-and-fatnodes-but-not-on-power9-or-vice-versa)
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
## 1. How to
### 1. Choose the number of chunks/jobs/cores in Compute()
Run Start() call to see the total size of the data you read in (remember to set ´retrieve = FALSE´).
Divide data into chunks according to the size of machine memory module (Power9 is 32GB; MN4 is 8GB). The data size per chunk should be 1/3 to 1/2 of the total memory module.
Find more details in practical_guide.md [How to choose the number of chunks, jobs and cores](inst/doc/practical_guide.md#how-to-choose-the-number-of-chunks-jobs-and-cores)
### 2. Merge/Reorder dimension in Start() (using parameter 'xxx_across' and 'merge_across_dims')
The parameter `'xxx_across = yyy'` indicates that the inner dimension 'xxx' is continuous along the file dimension 'yyy'. A common example is 'time_across = chunk', when the experiment runs through many years and the result is saved in several chunk files. Find more details in startR documentation.
If you define this parameter, you can specify 'xxx' with the indices throughout the whole 'yyy' files, not only within one file. See Example 1 below, 'time = indices(1:24)' is available when 'time_across = chunk' is specified. If not, 'time' can only be 12 for most.
One example making advantage of 'xxx_across' is extracting an climate event across years, like El Niño. If the event starts from Nov 2014 to May 2016 (19 months in total), simply specify 'time = indices(11:29)' (Example 2)
The thing you should bear in mind when using this parameter is the returned data structure. First, **the length of the return xxx dimension is the length of the longest xxx in all files**. Take the El Niño above as an example. The first chunk has 2 months, the second chunk has 12 months, and the third chunk has 5 months. Therefore, the length of time dimension will be 12, and the length of chunk dimension will be 3.
Second, the way Start() store data is **put data at the left-most position**. Take the El Niño (Example 2) above as an example again. The first chunk has only 2 months, so position 1 and 2 have values (which are Nov and Dec 2014). The second chunk has 12 months, so all positions have values (Jan to Dec 2015), while position 3 to 12 will be NA. The third chunk has 5 months, so position 1 to 5 have values (which are Jan to May 2016), while position 6 to 12 will be NA.
It seems more reasonable to put NA at position 1 to 10 in first chunk (Jan to Oct 2014) and and position 6 to 12 in the third chunk (June to Dec 2016). But if the data is not continuous or picked irregularly , it is hard to judge the correct NA position (see Example 3).
Since Start() is very flexible with any possible way to read-in data, it is difficult to include all the possibilities and make the output data structure reasonable all the time. Therefore, it is recommended to understand the way Start() rolls first, then you know what you should expect from the output and will not get confused with what it returns to you.
As for parameter 'merge_across_dims', it decides whether to connect all 'xxx' together along 'yyy' or not. See Example 1. If 'merge_across_dims = TRUE', the chunk dimension will disappear. 'merge_across_dims' simply attaches data one after another, so the NA values (if exist) will be the same places as the unmerged one (see Example 2).
Example 1
```r
data <- Start(dat = repos,
var = 'tas',
time = indices(1:24), # each file has 12 months; read 24 months in total
chunk = indices(1:2), #two years, each with 12 months
lat = 'all',
lon = 'all',
time_across = 'chunk',
merge_across_dims = FALSE, #TRUE,
return_vars = list(lat = NULL, lon = NULL),
retrieve = TRUE)
#return dimension (merge_across_dims = FALSE)
dat var time chunk lat lon
1 1 12 2 256 512
#return dimension (merge_across_dims = TRUE)
dat var time lat lon
1 1 24 256 512
```
Example 2: El Niño event
```r
repos <- '/esarchive/exp/ecearth/a1tr/cmorfiles/CMIP/EC-Earth-Consortium/EC-Earth3/historical/$memb$/Omon/$var$/gr/v20190312/$var$_Omon_EC-Earth3_historical_$memb$_gr_$chunk$.nc'
data <- Start(dat = repos,
var = 'tos',
memb = 'r24i1p1f1',
time = indices(4:27), # Apr 1957 to Mar 1959
chunk = c('195701-195712', '195801-195812', '195901-195912'),
lat = 'all',
lon = 'all',
time_across = 'chunk',
merge_across_dims = FALSE,
return_vars = list(lat = NULL, lon = NULL),
retrieve = TRUE)
> dim(data)
dat var memb time chunk lat lon
1 1 1 12 3 256 512
> data[1,1,1,,,100,100]
[,1] [,2] [,3]
[1,] 300.7398 300.7659 301.7128
[2,] 299.6569 301.8241 301.4781
[3,] 298.3954 301.6472 301.3807
[4,] 297.1931 301.0621 NA
[5,] 295.9608 299.1324 NA
[6,] 295.4735 297.4028 NA
[7,] 295.8538 296.1619 NA
[8,] 297.9998 295.2794 NA
[9,] 299.4571 295.0474 NA
[10,] NA 295.4571 NA
[11,] NA 296.8002 NA
[12,] NA 299.0254 NA
#To move the NAs in the first year to Jan to Mar
> asd <- Subset(data, c(5), list(1))
> qwe <- asd[, , , c(10:12, 1:9), , ,]
> data[, , , , 1, ,] <- qwe
> data[1, 1, 1, , , 100, 100]
[,1] [,2] [,3]
[1,] NA 300.7659 301.7128
[2,] NA 301.8241 301.4781
[3,] NA 301.6472 301.3807
[4,] 300.7398 301.0621 NA
[5,] 299.6569 299.1324 NA
[6,] 298.3954 297.4028 NA
[7,] 297.1931 296.1619 NA
[8,] 295.9608 295.2794 NA
[9,] 295.4735 295.0474 NA
[10,] 295.8538 295.4571 NA
[11,] 297.9998 296.8002 NA
[12,] 299.4571 299.0254 NA
```
Example 3: Read in three winters (DJF)
```r
repos <- '/esarchive/exp/ecearth/a1tr/cmorfiles/CMIP/EC-Earth-Consortium/EC-Earth3/historical/$memb$/Omon/$var$/gr/v20190312/$var$_Omon_EC-Earth3_historical_$memb$_gr_$chunk$.nc'
data <- Start(dat = repos,
var = 'tos',
memb = 'r24i1p1f1',
time = c(12:14, 24:26, 36:38), # DJF, Dec 1999 to Jan 2002
chunk = c('199901-199912', '200001-200012', '200101-200112', '200201-200212'),
lat = 'all',
lon = 'all',
time_across = 'chunk',
merge_across_dims = TRUE,
return_vars = list(lat = NULL, lon = NULL),
retrieve = TRUE)
> dim(data)
dat var memb time lat lon
1 1 1 12 256 512
> data[1, 1, 1, , 100, 100]
[1] 300.0381 NA NA 301.3340 302.0320 300.3575 301.0930 301.4149
[9] 299.3486 300.7203 301.6695 NA
#Remove NAs and rearrange DJF
> qwe <- Subset(asd, c(4), list(c(1, 4:11)))
> zxc <- InsertDim(InsertDim(qwe, 5, 3), 6, 3)
> zxc <- Subset(zxc, 'time', list(1), drop = 'selected')
> zxc[, , , 1:3, 1, ,] <- qwe[, , , 1:3, ,]
> zxc[, , , 1:3, 2, ,] <- qwe[, , , 4:6, ,]
> zxc[, , , 1:3, 3, ,] <- qwe[, , , 7:9, ,]
> names(dim(zxc))[4] <- c('month')
> names(dim(zxc))[5] <- c('year')
> dim(zxc)
dat var memb month year lat lon
1 1 1 3 3 256 512
> zxc[1, 1, 1, , , 100, 100]
[,1] [,2] [,3]
[1,] 300.0381 300.3575 299.3486
[2,] 301.3340 301.0930 300.7203
[3,] 302.0320 301.4149 301.6695
```
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
The workflow to use Compute() is: 'define the function' -> 'use Step() to assign the target/output dimension' -> 'use AddStep() to build up workflow' -> 'use Compute() to launch jobs on either local workstation or fatnodes/Power9'.
It is no problem when you only have a simple function directly defined in your script (like the example in [practical guide](https://earth.bsc.es/gitlab/es/startR/blob/master/inst/doc/practical_guide.md#step-and-addstep)). However, if the function is more complicated, you may want to save it as an independent file. In this case, the machines (Power 9 or fatnodes) cannot recognize your function therefore the jobs will fail (if you use Compute() at your own local workstation, the problem does not exist.)
The solution is simple. First, put your function file at somewhere in the machine. For example, in Power 9, put own_func.R at `/esarchive/scratch/<your_user_name>`. Second, in the script, source the function in the function definition (see the example below). Hence, the machine can find your function.
```r
data <- Start(...,
retrieve = FALSE)
func <- function(x) {
source("/esarchive/scratch/aho/own_func.R") #the path in Power 9
y <- own_func(x, posdim = 'time')
return(y)
}
step <- Step(fun = func,
target_dims = c('time'),
output_dims = c('time'))#,
wf <- AddStep(data, step)
res <- Compute(wf, ...)
```
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
In the workflow for Compute(), first step is to define the function. If you want to use the function in certain R package, you need to check if the package is involved in the R module (`r_module`) or library (`lib_dir`). Then, specify the package name before the function name (see example below) so the machine can recognize which function you refer to.
```r
data <- Start(...,
retrieve = FALSE)
func <- function(x) {
y <- s2dverification::Season(x, posdim = 'time') #specify package name
return(y)
}
step <- Step(fun = func,
target_dims = c('time'),
output_dims = c('time'))
wf <- AddStep(data, step)
res <- Compute(wf,
chunks = list(latitude = 2,
longitude = 2),
threads_load = 2,
threads_compute = 4,
cluster = list(queue_host = 'p1', #your alias for power9
queue_type = 'slurm',
temp_dir = '/gpfs/scratch/bsc32/bsc32734/startR_hpc/',
lib_dir = '/gpfs/projects/bsc32/share/R_libs/3.5/', #s2dverification is involved here, so the machine can find Season()
r_module = 'startR/0.1.2-foss-2018b-R-3.5.0',
job_wallclock = '00:10:00',
cores_per_job = 4,
max_jobs = 4,
bidirectional = FALSE,
polling_period = 50
),
ecflow_suite_dir = '/home/Earth/aho/startR_local/',
wait = TRUE
)
```
### 5. Do interpolation in Start() (using parameter 'transform')
If you want to do the interpolation within Start(), you can use the following four parameters:
1. **`transform`**: Assign the interpolation function. It is recommended to use `startR::CDORemapper`, the wrapper function of s2dverification::CDORemap().
2. **`transform_params`**: A list of the required inputs for `transform`. Take `transform = CDORemapper` as an example, the common items are:
- `grid`: A character string specifying either a name of a target grid (recognized by CDO, e.g., 'r256x128', 't106grid') or a path to another NetCDF file with the target grid (a single grid must be defined in such file).
- `method`: A character string specifying an interpolation method (recognized by CDO, e.g., 'con', 'bil', 'bic', 'dis'). The following long names are also supported: 'conservative', 'bilinear', 'bicubic', and 'distance-weighted'.
- `crop`: Whether to crop the data after interpolation with 'cdo sellonlatbox' (TRUE) or to extend interpolated data to the whole region as CDO does by default (FALSE).
If crop = TRUE, the longitude and latitude borders to be cropped at are taken as the limits of the cells at the borders ('lons' and 'lats' are perceived as cell centers), i.e., the resulting array will contain data that covers the same area as the input array. This is equivalent to specifying crop = 'preserve', i.e. preserving area.
If crop = 'tight', the borders to be cropped at are taken as the minimum and maximum cell centers in ’lons’ and ’lats’, i.e., the area covered by the resulting array may be smaller if interpolating from a coarse grid to a fine grid.
The parameter ’crop’ also accepts a numeric vector of custom borders: c(western border, eastern border, southern border, northern border).
3. **`transform_vars`**: A character vector of the inner dimensions to be transformed. E.g., c('latitude', 'longitude').
4. **`transform_extra_cells`**: A numeric indicating the number of grid cell to extend from the borders if the interpolating region is a subset of the whole region. 2 as default, which is consistent with the method in s2dverification::Load().
You can find an example script here [ex1_1_tranform.R](/inst/doc/usecase/ex1_1_tranform.R)
You can see more information in s2dverification::CDORemap documentation [here](https://earth.bsc.es/gitlab/es/s2dverification/blob/master/man/CDORemap.Rd).
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
### 6. Get data attributes without retrieving data to workstation
One of the most useful functionalities of Start() is the parameter `retrieve = FALSE`. It creates a pointer to data repository and tells you the data information without occupying your workstation memory. The better thing is, even the data is not actually retrieved, you can still use its attributes:
```r
header <- Start(dat = repos,
...,
retrieve = FALSE)
class(header)
#[1] "startR_cube"
# check attributes
str(attr(header, 'Variables'))
# Get longitude and latitude
lons <- attr(header, 'Variables')$common$lon
lats <- attr(header, 'Variables')$common$lat
# Get dimension
dim <- attr(header, 'Dimensions')
```
And if you want to retrieve the data to the workstation afterward, you can use `eval()`:
```r
data <- eval(header)
class(data)
#[1] "startR_array"
# Get dimension
dim(data)
```
Find examples at [usecase.md](/inst/doc/usecase.md), ex1_1 and ex1_3.
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
### 7. Avoid or specify a node from cluster in Compute()
When submitting a job to Fatnodes using Compute(), the parameter 'extra_queue_params' could be used to restricthe job to be run in a expecific node as follows:
```
extra_queue_params = list('#SBATCH -w moore'),
```
or exclude a specific node from job by:
```
extra_queue_params = list('#SBATCH -x moore'),
```
Look at the position of `extra_queue_params` parameter in a full call of Compute:
```
res <- Compute(wf1,
chunks = list(ensemble = 20,
sdate = 2),
threads_load = 2,
threads_compute = 4,
cluster = list(queue_host = queue_host,
queue_type = 'slurm',
extra_queue_params = list('#SBATCH -x moore'),
cores_per_job = 2,
temp_dir = temp_dir,
r_module = 'R/3.5.0-foss-2018b',
polling_period = 10,
job_wallclock = '01:00:00',
max_jobs = 40,
bidirectional = FALSE),
ecflow_suite_dir = ecflow_suite_dir,
wait = TRUE)
```
The structure of the BSC Earth data repository 'esarchive' allows us to create a path pattern to the data by using different variables (between dollar symbol), such as `$var$`, for the variable name, or `$sdates$`, for the start date of the simulation. Here is an example for loading monthly simulations of system4_m1 data:
`path <- '/esarchive/exp/ecmwf/system4_m1/monthly_mean/$var$_f6h/$var$_$sdate$.nc'`
The function Start() will require two parameters 'var' and 'sdate' to load the desired data.
In some cases, the creation of the path could be a little bit more complicated. Some researchers create their own EC-Earth experiments which are identified by an experiment ID (`$expid$`) and with different model version (`$version`), even for different members (`$member$`):
| expid | member | version |
|-------|----------|---------|
| a1st | r7i1p1f1 |v20190302|
| a1sx |r10i1p1f1 |v20190308|
In this case, the variable member and version have different value depending on the expid (the member r10i1p1f1 does not exist for expid a1st). The path will include this varibles:
`path <- '/esarchive/exp/ecearth/$expid$/diags/CMIP/EC-Earth-Consortium/EC-Earth3/historical/$member$/Omon/$var$/gn/$version$/$var$_Omon_EC-Earth3_historical_$member$_gn_$year$.nc'`
However, the following parameters are mandatory to make Start() aware of that they are not independent variables:
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
```
member_depends = 'expid',
version_depends = 'expid',
```
The final Start() call will look like:
```
yrh1 = 1960
yrh2 = 2014
years <- paste0(c(yrh1 : yrh2), '01-', c(yrh1 : yrh2), '12')
data <- Start(dat = repos,
var = 'tosmean',
expid = c('a1st','a1sx'),
member = 'all',
version = 'all',
member_depends = 'expid',
version_depends = 'expid',
year = years,
time = 'all',
region = indices(1 : 4),
return_vars = list(time = NULL, region = NULL),
retrieve = TRUE)
```
## Something goes wrong...
### 1. No space left on device
An issue of R is the accumulated trash files, which occupy the machine memory therefore crash R. If the size of data your R script deal with is reasonable but R crashes immediately after running and returns the ERROR:
>
> No space left on device
>
Go to **/dev/shm/** and `rm <large_trash_file_name>`
Find more discussion in this [issue](https://earth.bsc.es/gitlab/es/s2dverification/issues/221)
### 2. ecFlow UI remains blue and does not update status
This situation will occur if:
1. The Compute() parameter `wait` is set to be `FALSE`, and
2. Launch jobs on an HPC where the connection with its login node is unidirectional (e.g., Power 9)
Under this condition, the ecFlow UI will remain blue and will not update the status.
To solve this problem, use `Collect()` in the R terminal after running Compute():
```r
res <- Compute(wf,
...,
wait = FALSE)
result <- Collect(res, wait = TRUE) #it will update ecflow_ui status continuously, but will block the R session
result <- Collect(res, wait = FALSE) #it will return the ecflow_ui status once only, but will not block the R session
```
### 3. Compute() successfully but then killed on R session
When Compute() on HPCs, the machines are able to process data which are much larger than the local workstation, so the computation works fine (i.e., on ec-Flow UI, the chunks show yellow in the end.) However, after the computation, the output will be sent back to local workstation. **If the returned data is larger than the available local memory space, your R session will be killed.** Therefore, always pre-check if the returned data will fit in your workstation free memory or not. If not, subset the input data or reduce the output size through more computation.
Further explanation: though the complete output (i.e., merging all the chunks into one returned array) cannot be sent back to workstation, but the chunking results (.Rds file) are completed and saved in the directory '<ecflow_suite_dir>/STARTR_CHUNKING_<job_id>'. If you still want to use the chunking results, you can find them there.
### 4. My jobs work well in workstation and fatnodes but not on Power9 (or vice versa)
There are several possible reasons for this situation. Here we list some of them, and please let us know if you find any other reason not listed here yet.
- **R module or package version difference.** Sometimes, the versions among these
machines are not consistency, and it might cause the problem. Try to load
different module to see if it fixes the problem.
- **The package is not known by the machine you use.** If the package you use
in the function does not include in the R module, you have to assign the
parameter `lib_dir` in the cluster list in Compute() (see more details in
[practical_guide.md](https://earth.bsc.es/gitlab/es/startR/blob/master/inst/doc/practical_guide.md#compute-on-cte-power-9).)
- **The function is specified the package name ahead.** The package name needs
to be added in front of function connected with '::' (e.g., `s2dv::Clim`) or with
':::' if the function is internal (e.g., `CSTools:::.cal`).
- **Source or load the file not in the machine you use.** If you use self-defined
function or load data in the function, you need to put those files in the machine
you run the computation on, so the machine can find it (e.g., when submitting jobs
to power9, you should put the files in Power9 instead of local workstation.)
- **Connection problem.** Test the successful script you used to use (if you do not
have one, go to [usecase.md](https://earth.bsc.es/gitlab/es/startR/tree/develop-FAQcluster/inst/doc/usecase) to find one!).
If it fails, it means that your connection to machine or the ecFlow setting has
some problem.