Newer
Older
Further explanation: though the complete output (i.e., merging all the chunks into one returned array) cannot be sent back to workstation, but the chunking results (.Rds file) are completed and saved in the directory '<ecflow_suite_dir>/STARTR_CHUNKING_<job_id>'. If you still want to use the chunking results, you can find them there.
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
### 4. My jobs work well in workstation and fatnodes but not on Power9 (or vice versa)
There are several possible reasons for this situation. Here we list some of them, and please let us know if you find any other reason not listed here yet.
- **R module or package version difference.** Sometimes, the versions among these
machines are not consistency, and it might cause the problem. Try to load
different module to see if it fixes the problem.
- **The package is not known by the machine you use.** If the package you use
in the function does not include in the R module, you have to assign the
parameter `lib_dir` in the cluster list in Compute() (see more details in
[practical_guide.md](https://earth.bsc.es/gitlab/es/startR/blob/master/inst/doc/practical_guide.md#compute-on-cte-power-9).)
- **The function is specified the package name ahead.** The package name needs
to be added in front of function connected with '::' (e.g., `s2dv::Clim`) or with
':::' if the function is internal (e.g., `CSTools:::.cal`).
- **Source or load the file not in the machine you use.** If you use self-defined
function or load data in the function, you need to put those files in the machine
you run the computation on, so the machine can find it (e.g., when submitting jobs
to power9, you should put the files in Power9 instead of local workstation.)
- **Connection problem.** Test the successful script you used to use (if you do not
have one, go to [usecase.md](https://earth.bsc.es/gitlab/es/startR/tree/develop-FAQcluster/inst/doc/usecase) to find one!).
If it fails, it means that your connection to machine or the ecFlow setting has
some problem.
- **Check 'return_vars' parameter in Start.** If the variable (usually set in 'var') is requested in 'return_vars' parameter of ´Start()´, the execution in a HPC cluster may fail.
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
### 5. Errors related to wrong file formatting
Several errors could be return when the files are not correctly formatted. If you see one of this errors, review the coordinates in your files:
```
Error in Rsx_nc4_put_vara_double: NetCDF: Numeric conversion not representable
Error in ncvar_put(ncdf_object, defined_vars[[var_counter]]$name, arrays[[i]], :
C function Rsx_nc4_put_vara_double returned error
```
```
Error in dim(x$x) <- dim_bk :
dims [product 1280] do not match the length of object [1233] <- this '1233' changes every time
```
```
Error in s2dverification::CDORemap(data_array, lons, lats, ...) :
Found invalid values in 'lons'.
```
```
ERROR: invalid cell
Aborting in file clipping.c, line 1295 ...
Error in s2dverification::CDORemap(data_array, lons, lats, ...) :
CDO remap failed.
```
When using a new cluster, some errors could happen. Here, there are some behaviours detected from issue #64.
- whether running Compute(), request password:
```
Password:
```
Check that the host name for the cluster has been include in the ´.ssh/config´.
Check also that the passwordless access has been properly set up. You can check that you can access the cluster without providing the password by using the host name ´ssh nord3´ (see more infor in the [**Practical guide**](inst/doc/practical_guide.md)).
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
In this case, the error ´No data files found for any of the specified datasets.´ will be returned.
- repetitive prints of modules loading:
```
load UDUNITS/2.1.24 (PATH)
load NETCDF/4.1.3 (PATH, LD_LIBRARY_PATH, NETCDF)
load R/2.15.2 (PATH, LD_LIBRARY_PATH)
```
The .bashrc in your Nord 3 home must be edit with the information from [BSC ES wiki](https://earth.bsc.es/wiki/doku.php?id=computing:nord3) to load correct modules. However, if you add a line before those, the result will be the one above.
Check your .bashrc to avoid loading modules before define the department ones.
- R versions: Workstation version versus remote cluster version
Some functions depends on the R version used and they should be compatible in workstation and in the remote cluster. If the error:
```
cannot read workspace version 3 written by R 3.6.2; need R 3.5.0 or newer
```
change the R version used in your workstation to one newer.
### 7. Start() fails retrieving data
If you get the following error message:
```
Exploring files... This will take a variable amount of time depending
* on the issued request and the performance of the file server...
Error in R_nc4_open: No such file or directory
Error in file_var_reader(NULL, file_object, NULL, var_to_read, synonims) :
Either 'file_path' or 'file_object' must be provided.
```
check if your path contains the label $var$ in the path. If not, try to added it as part of the path or the file name. Where $var$ is the variable to retrieve from files.