faq.md 58.2 KB
Newer Older
aho's avatar
aho committed
Further explanation: though the complete output (i.e., merging all the chunks into one returned array) cannot be sent back to workstation, but the chunking results (.Rds file) are completed and saved in the directory '<ecflow_suite_dir>/STARTR_CHUNKING_<job_id>'. If you still want to use the chunking results, you can find them there.

aho's avatar
aho committed

### 4. My jobs work well in workstation and fatnodes but not on Power9 (or vice versa)

There are several possible reasons for this situation. Here we list some of them, and please let us know if you find any other reason not listed here yet.
- **R module or package version difference.** Sometimes, the versions among these 
machines are not consistency, and it might cause the problem. Try to load 
different module to see if it fixes the problem.  
- **The package is not known by the machine you use.** If the package you use 
in the function does not include in the R module, you have to assign the 
parameter `lib_dir` in the cluster list in Compute() (see more details in 
[practical_guide.md](https://earth.bsc.es/gitlab/es/startR/blob/master/inst/doc/practical_guide.md#compute-on-cte-power-9).) 
- **The function is specified the package name ahead.** The package name needs 
to be added in front of function connected with '::' (e.g., `s2dv::Clim`) or with
 ':::' if the function is internal (e.g., `CSTools:::.cal`).
- **Source or load the file not in the machine you use.** If you use self-defined 
function or load data in the function, you need to put those files in the machine 
you run the computation on, so the machine can find it (e.g., when submitting jobs 
to power9, you should put the files in Power9 instead of local workstation.)
- **Connection problem.** Test the successful script you used to use (if you do not 
have one, go to [usecase.md](https://earth.bsc.es/gitlab/es/startR/tree/develop-FAQcluster/inst/doc/usecase) to find one!). 
If it fails, it means that your connection to machine or the ecFlow setting has 
some problem.
nperez's avatar
nperez committed
- **Check 'return_vars' parameter in Start.** If the variable (usually set in 'var') is requested in 'return_vars' parameter of ´Start()´, the execution in a HPC cluster may fail. 
aho's avatar
aho committed

###  5. Errors related to wrong file formatting

Several errors could be return when the files are not correctly formatted. If you see one of this errors, review the coordinates in your files:

```
Error in Rsx_nc4_put_vara_double: NetCDF: Numeric conversion not representable
Error in ncvar_put(ncdf_object, defined_vars[[var_counter]]$name, arrays[[i]], : 
 C function Rsx_nc4_put_vara_double returned error
```

```
Error in dim(x$x) <- dim_bk :
  dims [product 1280] do not match the length of object [1233]  <- this '1233' changes every time
```

```
Error in s2dverification::CDORemap(data_array, lons, lats, ...) : 
  Found invalid values in 'lons'.
```

```
ERROR: invalid cell
 
Aborting in file clipping.c, line 1295 ...
Error in s2dverification::CDORemap(data_array, lons, lats, ...) : 
  CDO remap failed.
```
nperez's avatar
nperez committed
###  6. Errors using a new cluster (setting Nord3)
nperez's avatar
nperez committed

When using a new cluster, some errors could happen. Here, there are some behaviours detected from issue #64.

- whether running Compute(), request password:

```
Password:
```

Check that the host name for the cluster has been include in the ´.ssh/config´. 
Check also that the passwordless access has been properly set up. You can check that you can access the cluster without providing the password by using the host name ´ssh nord3´ (see more infor in the [**Practical guide**](inst/doc/practical_guide.md)).

Andrea's avatar
Andrea committed
- alias may not be available, such as 'esnas' for 'esarchive'
nperez's avatar
nperez committed

In this case, the error ´No data files found for any of the specified datasets.´ will be returned.

- repetitive prints of modules loading:

```
load UDUNITS/2.1.24 (PATH)
load NETCDF/4.1.3 (PATH, LD_LIBRARY_PATH, NETCDF)
load R/2.15.2 (PATH, LD_LIBRARY_PATH)
```

The .bashrc in your Nord 3 home must be edit with the information from [BSC ES wiki](https://earth.bsc.es/wiki/doku.php?id=computing:nord3) to load correct modules. However, if you add a line before those, the result will be the one above.

Check your .bashrc to avoid loading modules before define the department ones.


- R versions: Workstation version versus remote cluster version

Some functions depends on the R version used and they should be compatible in workstation and in the remote cluster. If the error:

```
cannot read workspace version 3 written by R 3.6.2; need R 3.5.0 or newer
```

change the R version used in your workstation to one newer.


nperez's avatar
nperez committed
### 7. Start() fails retrieving data

If you get the following error message:
nperez's avatar
nperez committed
```
Exploring files... This will take a variable amount of time depending
*   on the issued request and the performance of the file server...
Error in R_nc4_open: No such file or directory
Error in file_var_reader(NULL, file_object, NULL, var_to_read, synonims) :
  Either 'file_path' or 'file_object' must be provided.
```

check if your path contains the label $var$ in the path. If not, try to added it as part of the path or the file name. Where $var$ is the variable to retrieve from files.