Newer
Older
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Load.R
Load(
var,
exp = NULL,
obs = NULL,
sdates,
nmember = NULL,
nmemberobs = NULL,
nleadtime = NULL,
leadtimemin = 1,
leadtimemax = NULL,
storefreq = "monthly",
sampleperiod = 1,
lonmin = 0,
lonmax = 360,
latmin = -90,
latmax = 90,
output = "areave",
method = "conservative",
grid = NULL,
maskmod = vector("list", 15),
maskobs = vector("list", 15),
configfile = NULL,
varmin = NULL,
varmax = NULL,
silent = FALSE,
nprocs = NULL,
dimnames = NULL,
remapcells = 2,
path_glob_permissive = "partial"
)
}
\arguments{
\item{var}{Short name of the variable to load. It should coincide with the
variable name inside the data files.\cr
E.g.: \code{var = 'tos'}, \code{var = 'tas'}, \code{var = 'prlr'}.\cr
In some cases, though, the path to the files contains twice or more times
the short name of the variable but the actual name of the variable inside
the data files is different. In these cases it may be convenient to provide
\code{var} with the name that appears in the file paths (see details on
parameters \code{exp} and \code{obs}).}
\item{exp}{Parameter to specify which experimental datasets to load data
from.\cr
It can take two formats: a list of lists or a vector of character strings.
Each format will trigger a different mechanism of locating the requested
datasets.\cr
The first format is adequate when loading data you'll only load once or
occasionally. The second format is targeted to avoid providing repeatedly
the information on a certain dataset but is more complex to use.\cr\cr
IMPORTANT: Place first the experiment with the largest number of members
and, if possible, with the largest number of leadtimes. If not possible,
the arguments 'nmember' and/or 'nleadtime' should be filled to not miss
any member or leadtime.\cr
If 'exp' is not specified or set to NULL, observational data is loaded for
each start-date as far as 'leadtimemax'. If 'leadtimemax' is not provided,
\code{Load()} will retrieve data of a period of time as long as the time
period between the first specified start date and the current date.\cr\cr
List of lists:\cr
A list of lists where each sub-list contains information on the location
and format of the data files of the dataset to load.\cr
Each sub-list can have the following components:
Nicolau Manubens Gil
committed
\itemize{
\item{'name': A character string to identify the dataset. Optional.}
\item{'path': A character string with the pattern of the path to the
files of the dataset. This pattern can be built up making use of some
special tags that \code{Load()} will replace with the appropriate
values to find the dataset files. The allowed tags are $START_DATE$,
$YEAR$, $MONTH$, $DAY$, $MEMBER_NUMBER$, $STORE_FREQ$, $VAR_NAME$,
$EXP_NAME$ (only for experimental datasets), $OBS_NAME$ (only for
observational datasets) and $SUFFIX$\cr
Example: /path/to/$EXP_NAME$/postprocessed/$VAR_NAME$/\cr
$VAR_NAME$_$START_DATE$.nc\cr
If 'path' is not specified and 'name' is specified, the dataset
information will be fetched with the same mechanism as when using
the vector of character strings (read below).
Nicolau Manubens Gil
committed
}
\item{'nc_var_name': Character string with the actual variable name
to look for inside the dataset files. Optional. Takes, by default,
the same value as the parameter 'var'.
Nicolau Manubens Gil
committed
}
\item{'suffix': Wildcard character string that can be used to build
the 'path' of the dataset. It can be accessed with the tag $SUFFIX$.
Optional. Takes '' by default.
\item{'var_min': Important: Character string. Minimum value beyond
which read values will be deactivated to NA. Optional. No deactivation
is performed by default.
\item{'var_max': Important: Character string. Maximum value beyond
which read values will be deactivated to NA. Optional. No deactivation
is performed by default.
The tag $START_DATES$ will be replaced with all the starting dates
specified in 'sdates'. $YEAR$, $MONTH$ and $DAY$ will take a value for each
iteration over 'sdates', simply these are the same as $START_DATE$ but
split in parts.\cr
$MEMBER_NUMBER$ will be replaced by a character string with each member
number, from 1 to the value specified in the parameter 'nmember' (in
experimental datasets) or in 'nmemberobs' (in observational datasets). It
will range from '01' to 'N' or '0N' if N < 10.\cr
$STORE_FREQ$ will take the value specified in the parameter 'storefreq'
('monthly' or 'daily').\cr
$VAR_NAME$ will take the value specified in the parameter 'var'.\cr
$EXP_NAME$ will take the value specified in each component of the parameter
'exp' in the sub-component 'name'.\cr
$OBS_NAME$ will take the value specified in each component of the parameter
'obs' in the sub-component 'obs.\cr
$SUFFIX$ will take the value specified in each component of the parameters
'exp' and 'obs' in the sub-component 'suffix'.\cr
\preformatted{
list(
list(
name = 'experimentA',
path = file.path('/path/to/$DATASET_NAME$/$STORE_FREQ$',
'$VAR_NAME$$SUFFIX$',
'$VAR_NAME$_$START_DATE$.nc'),
nc_var_name = '$VAR_NAME$',
suffix = '_3hourly',
var_min = '-1e19',
var_max = '1e19'
)
)
}
This will make \code{Load()} look for, for instance, the following paths,
if 'sdates' is c('19901101', '19951101', '20001101'):\cr
/path/to/experimentA/monthly_mean/tas_3hourly/tas_19901101.nc\cr
/path/to/experimentA/monthly_mean/tas_3hourly/tas_19951101.nc\cr
/path/to/experimentA/monthly_mean/tas_3hourly/tas_20001101.nc\cr\cr
Vector of character strings:
To avoid specifying constantly the same information to load the same
datasets, a vector with only the names of the datasets to load can be
specified.\cr
\code{Load()} will then look for the information in a configuration file
whose path must be specified in the parameter 'configfile'.\cr
Check \code{?ConfigFileCreate}, \code{ConfigFileOpen},
\code{ConfigEditEntry} & co. to learn how to create a new configuration
file and how to add the information there.\cr
Example: c('experimentA', 'experimentB')}
\item{obs}{Argument with the same format as parameter 'exp'. See details on
parameter 'exp'.\cr
If 'obs' is not specified or set to NULL, no observational data is loaded.\cr}
\item{sdates}{Vector of starting dates of the experimental runs to be loaded
following the pattern 'YYYYMMDD'.\cr
Nicolau Manubens Gil
committed
This argument is mandatory.\cr
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
E.g. c('19601101', '19651101', '19701101')}
\item{nmember}{Vector with the numbers of members to load from the specified
experimental datasets in 'exp'.\cr
If not specified, the automatically detected number of members of the
first experimental dataset is detected and replied to all the experimental
datasets.\cr
If a single value is specified it is replied to all the experimental
datasets.\cr
Data for each member is fetched in the file system. If not found is
filled with NA values.\cr
An NA value in the 'nmember' list is interpreted as "fetch as many members
of each experimental dataset as the number of members of the first
experimental dataset".\cr
Note: It is recommended to specify the number of members of the first
experimental dataset if it is stored in file per member format because
there are known issues in the automatic detection of members if the path
to the dataset in the configuration file contains Shell Globbing wildcards
such as '*'.\cr
E.g., c(4, 9)}
\item{nmemberobs}{Vector with the numbers of members to load from the
specified observational datasets in 'obs'.\cr
If not specified, the automatically detected number of members of the
first observational dataset is detected and replied to all the
observational datasets.\cr
If a single value is specified it is replied to all the observational
datasets.\cr
Data for each member is fetched in the file system. If not found is
filled with NA values.\cr
An NA value in the 'nmemberobs' list is interpreted as "fetch as many
members of each observational dataset as the number of members of the
first observational dataset".\cr
Note: It is recommended to specify the number of members of the first
observational dataset if it is stored in file per member format because
there are known issues in the automatic detection of members if the path
to the dataset in the configuration file contains Shell Globbing wildcards
such as '*'.\cr
E.g., c(1, 5)}
\item{nleadtime}{Deprecated. See parameter 'leadtimemax'.}
\item{leadtimemin}{Only lead-times higher or equal to 'leadtimemin' are
loaded. Takes by default value 1.}
\item{leadtimemax}{Only lead-times lower or equal to 'leadtimemax' are loaded.
Takes by default the number of lead-times of the first experimental
dataset in 'exp'.\cr
If 'exp' is NULL this argument won't have any effect
(see \code{?Load} description).}
\item{storefreq}{Frequency at which the data to be loaded is stored in the
file system. Can take values 'monthly' or 'daily'.\cr
Nicolau Manubens Gil
committed
By default it takes 'monthly'.\cr
Note: Data stored in other frequencies with a period which is divisible by
a month can be loaded with a proper use of 'storefreq' and 'sampleperiod'
parameters. It can also be loaded if the period is divisible by a day and
the observational datasets are stored in a file per dataset format or
'obs' is empty.}
\item{sampleperiod}{To load only a subset between 'leadtimemin' and
'leadtimemax' with the period of subsampling 'sampleperiod'.\cr
Nicolau Manubens Gil
committed
Takes by default value 1 (all lead-times are loaded).\cr
See 'storefreq' for more information.}
\item{lonmin}{If a 2-dimensional variable is loaded, values at longitudes
lower than 'lonmin' aren't loaded.\cr
Must take a value in the range [-360, 360] (if negative longitudes are
found in the data files these are translated to this range).\cr
Nicolau Manubens Gil
committed
It is set to 0 if not specified.\cr
If 'lonmin' > 'lonmax', data across Greenwich is loaded.}
\item{lonmax}{If a 2-dimensional variable is loaded, values at longitudes
higher than 'lonmax' aren't loaded.\cr
Must take a value in the range [-360, 360] (if negative longitudes are
found in the data files these are translated to this range).\cr
Nicolau Manubens Gil
committed
It is set to 360 if not specified.\cr
If 'lonmin' > 'lonmax', data across Greenwich is loaded.}
\item{latmin}{If a 2-dimensional variable is loaded, values at latitudes
lower than 'latmin' aren't loaded.\cr
Nicolau Manubens Gil
committed
Must take a value in the range [-90, 90].\cr
It is set to -90 if not specified.}
\item{latmax}{If a 2-dimensional variable is loaded, values at latitudes
higher than 'latmax' aren't loaded.\cr
Nicolau Manubens Gil
committed
Must take a value in the range [-90, 90].\cr
It is set to 90 if not specified.}
\item{output}{This parameter determines the format in which the data is
arranged in the output arrays.\cr
Nicolau Manubens Gil
committed
Can take values 'areave', 'lon', 'lat', 'lonlat'.\cr
\itemize{
\item{'areave': Time series of area-averaged variables over the specified domain.}
\item{'lon': Time series of meridional averages as a function of longitudes.}
\item{'lat': Time series of zonal averages as a function of latitudes.}
\item{'lonlat': Time series of 2d fields.}
}
Takes by default the value 'areave'. If the variable specified in 'var' is
a global mean, this parameter is forced to 'areave'.\cr
All the loaded data is interpolated into the grid of the first experimental
dataset except if 'areave' is selected. In that case the area averages are
computed on each dataset original grid. A common grid different than the
first experiment's can be specified through the parameter 'grid'. If 'grid'
is specified when selecting 'areave' output type, all the loaded data is
interpolated into the specified grid before calculating the area averages.}
\item{method}{This parameter determines the interpolation method to be used
when regridding data (see 'output'). Can take values 'bilinear', 'bicubic',
'conservative', 'distance-weighted'.\cr
See \code{remapcells} for advanced adjustments.\cr
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
Takes by default the value 'conservative'.}
\item{grid}{A common grid can be specified through the parameter 'grid' when
loading 2-dimensional data. Data is then interpolated onto this grid
whichever 'output' type is specified. If the selected output type is
'areave' and a 'grid' is specified, the area averages are calculated after
interpolating to the specified grid.\cr
If not specified and the selected output type is 'lon', 'lat' or 'lonlat',
this parameter takes as default value the grid of the first experimental
dataset, which is read automatically from the source files.\cr
The grid must be supported by 'cdo' tools. Now only supported: rNXxNY
or tTRgrid.\cr
Both rNXxNY and tRESgrid yield rectangular regular grids. rNXxNY yields
grids that are evenly spaced in longitudes and latitudes (in degrees).
tRESgrid refers to a grid generated with series of spherical harmonics
truncated at the RESth harmonic. However these spectral grids are usually
associated to a gaussian grid, the latitudes of which are spaced with a
Gaussian quadrature (not evenly spaced in degrees). The pattern tRESgrid
will yield a gaussian grid.\cr
E.g., 'r96x72'
Advanced: If the output type is 'lon', 'lat' or 'lonlat' and no common
grid is specified, the grid of the first experimental or observational
dataset is detected and all data is then interpolated onto this grid.
If the first experimental or observational dataset's data is found shifted
along the longitudes (i.e., there's no value at the longitude 0 but at a
longitude close to it), the data is re-interpolated to suppress the shift.
This has to be done in order to make sure all the data from all the
datasets is properly aligned along longitudes, as there's no option so far
in \code{Load} to specify grids starting at longitudes other than 0.
This issue doesn't affect when loading in 'areave' mode without a common
grid, the data is not re-interpolated in that case.}
\item{maskmod}{List of masks to be applied to the data of each experimental
dataset respectively, if a 2-dimensional variable is specified in 'var'.\cr
Each mask can be defined in 2 formats:\cr
a) a matrix with dimensions c(longitudes, latitudes).\cr
b) a list with the components 'path' and, optionally, 'nc_var_name'.\cr
In the format a), the matrix must have the same size as the common grid
or with the same size as the grid of the corresponding experimental dataset
if 'areave' output type is specified and no common 'grid' is specified.\cr
In the format b), the component 'path' must be a character string with the
path to a NetCDF mask file, also in the common grid or in the grid of the
corresponding dataset if 'areave' output type is specified and no common
'grid' is specified. If the mask file contains only a single variable,
there's no need to specify the component 'nc_var_name'. Otherwise it must
be a character string with the name of the variable inside the mask file
that contains the mask values. This variable must be defined only over 2
dimensions with length greater or equal to 1.\cr
Whichever the mask format, a value of 1 at a point of the mask keeps the
original value at that point whereas a value of 0 disables it (replaces
by a NA value).\cr
Nicolau Manubens Gil
committed
By default all values are kept (all ones).\cr
The longitudes and latitudes in the matrix must be in the same order as in
the common grid or as in the original grid of the corresponding dataset
when loading in 'areave' mode. You can find out the order of the longitudes
and latitudes of a file with 'cdo griddes'.\cr
Note that in a common CDO grid defined with the patterns 't<RES>grid' or
'r<NX>x<NY>' the latitudes and latitudes are ordered, by definition, from
-90 to 90 and from 0 to 360, respectively.\cr
If you are loading maps ('lonlat', 'lon' or 'lat' output types) all the
data will be interpolated onto the common 'grid'. If you want to specify
a mask, you will have to provide it already interpolated onto the common
grid (you may use 'cdo' libraries for this purpose). It is not usual to
apply different masks on experimental datasets on the same grid, so all
the experiment masks are expected to be the same.\cr
Warning: When loading maps, any masks defined for the observational data
will be ignored to make sure the same mask is applied to the experimental
and observational data.\cr
Nicolau Manubens Gil
committed
Warning: list() compulsory even if loading 1 experimental dataset only!\cr
E.g., list(array(1, dim = c(num_lons, num_lats)))}
\item{maskobs}{See help on parameter 'maskmod'.}
\item{configfile}{Path to the s2dverification configuration file from which
to retrieve information on location in file system (and other) of datasets.\cr
If not specified, the configuration file used at BSC-ES will be used
(it is included in the package).\cr
Check the BSC's configuration file or a template of configuration file in
the folder 'inst/config' in the package.\cr
Check further information on the configuration file mechanism in
\code{ConfigFileOpen()}.}
\item{varmin}{Loaded experimental and observational data values smaller
than 'varmin' will be disabled (replaced by NA values).\cr
By default no deactivation is performed.}
\item{varmax}{Loaded experimental and observational data values greater
than 'varmax' will be disabled (replaced by NA values).\cr
By default no deactivation is performed.}
\item{silent}{Parameter to show (FALSE) or hide (TRUE) information messages.\cr
Nicolau Manubens Gil
committed
Warnings will be displayed even if 'silent' is set to TRUE.\cr
Takes by default the value 'FALSE'.}
\item{nprocs}{Number of parallel processes created to perform the fetch
and computation of data.\cr
These processes will use shared memory in the processor in which Load()
is launched.\cr
By default the number of logical cores in the machine will be detected
and as many processes as logical cores there are will be created.\cr
Nicolau Manubens Gil
committed
A value of 1 won't create parallel processes.\cr
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
When running in multiple processes, if an error occurs in any of the
processes, a crash message appears in the R session of the original
process but no detail is given about the error. A value of 1 will display
all error messages in the original and only R session.\cr
Note: the parallel process create other blocking processes each time they
need to compute an interpolation via 'cdo'.}
\item{dimnames}{Named list where the name of each element is a generic
name of the expected dimensions inside the NetCDF files. These generic
names are 'lon', 'lat' and 'member'. 'time' is not needed because it's
detected automatically by discard.\cr
The value associated to each name is the actual dimension name in the
NetCDF file.\cr
The variables in the file that contain the longitudes and latitudes of
the data (if the data is a 2-dimensional variable) must have the same
name as the longitude and latitude dimensions.\cr
By default, these names are 'longitude', 'latitude' and 'ensemble. If any
of those is defined in the 'dimnames' parameter, it takes priority and
overwrites the default value.
E.g., list(lon = 'x', lat = 'y')
In that example, the dimension 'member' will take the default value 'ensemble'.}
\item{remapcells}{When loading a 2-dimensional variable, spatial subsets can
be requested via \code{lonmin}, \code{lonmax}, \code{latmin} and
\code{latmax}. When \code{Load()} obtains the subset it is then
interpolated if needed with the method specified in \code{method}.\cr
The result of this interpolation can vary if the values surrounding the
spatial subset are not present. To better control this process, the width
in number of grid cells of the surrounding area to be taken into account
can be specified with \code{remapcells}. A value of 0 will take into
account no additional cells but will generate less traffic between the
storage and the R processes that load data.\cr
A value beyond the limits in the data files will be automatically runcated
to the actual limit.\cr
The default value is 2.}
\item{path_glob_permissive}{In some cases, when specifying a path pattern
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
(either in the parameters 'exp'/'obs' or in a configuration file) one can
specify path patterns that contain shell globbing expressions. Too much
freedom in putting globbing expressions in the path patterns can be
dangerous and make \code{Load()} find a file in the file system for a
start date for a dataset that really does not belong to that dataset.
For example, if the file system contains two directories for two different
experiments that share a part of their path and the path pattern contains
globbing expressions:
/experiments/model1/expA/monthly_mean/tos/tos_19901101.nc
/experiments/model2/expA/monthly_mean/tos/tos_19951101.nc
And the path pattern is used as in the example right below to load data of
only the experiment 'expA' of the model 'model1' for the starting dates
'19901101' and '19951101', \code{Load()} will undesiredly yield data for
both starting dates, even if in fact there is data only for the
first one:\cr
\code{
expA <- list(path = file.path('/experiments/*/expA/monthly_mean/$VAR_NAME$',
'$VAR_NAME$_$START_DATE$.nc')
data <- Load('tos', list(expA), NULL, c('19901101', '19951101'))
}
To avoid these situations, the parameter \code{path_glob_permissive} is
set by default to \code{'partial'}, which forces \code{Load()} to replace
all the globbing expressions of a path pattern of a data set by fixed
values taken from the path of the first found file for each data set, up
to the folder right before the final files (globbing expressions in the
file name will not be replaced, only those in the path to the file).
Replacement of globbing expressions in the file name can also be triggered
by setting \code{path_glob_permissive} to \code{FALSE} or \code{'no'}. If
needed to keep all globbing expressions, \code{path_glob_permissive} can
be set to \code{TRUE} or \code{'yes'}.}
\code{Load()} returns a named list following a structure similar to the
used in the package 'downscaleR'.\cr
The components are the following:
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
\itemize{
\item{
'mod' is the array that contains the experimental data. It has the
attribute 'dimensions' associated to a vector of strings with the
labels of each dimension of the array, in order. The order of the
latitudes is always forced to be from 90 to -90 whereas the order of
the longitudes is kept as in the original files (if possible). The
longitude values provided in \code{lon} lower than 0 are added 360
(but still kept in the original order). In some cases, however, if
multiple data sets are loaded in longitude-latitude mode, the
longitudes (and also the data arrays in \code{mod} and \code{obs}) are
re-ordered afterwards by \code{Load()} to range from 0 to 360; a
warning is given in such cases. The longitude and latitude of the
center of the grid cell that corresponds to the value [j, i] in 'mod'
(along the dimensions latitude and longitude, respectively) can be
found in the outputs \code{lon}[i] and \code{lat}[j]
}
\item{'obs' is the array that contains the observational data. The
same documentation of parameter 'mod' applies to this parameter.}
\item{'lat' and 'lon' are the latitudes and longitudes of the centers of
the cells of the grid the data is interpolated into (0 if the loaded
variable is a global mean or the output is an area average).\cr
Both have the attribute 'cdo_grid_des' associated with a character
string with the name of the common grid of the data, following the CDO
naming conventions for grids.\cr
'lon' has the attributes 'first_lon' and 'last_lon', with the first
and last longitude values found in the region defined by 'lonmin' and
'lonmax'. 'lat' has also the equivalent attributes 'first_lat' and
'last_lat'.\cr
'lon' has also the attribute 'data_across_gw' which tells whether the
requested region via 'lonmin', 'lonmax', 'latmin', 'latmax' goes across
the Greenwich meridian. As explained in the documentation of the
parameter 'mod', the loaded data array is kept in the same order as in
the original files when possible: this means that, in some cases, even
if the data goes across the Greenwich, the data array may not go
across the Greenwich. The attribute 'array_across_gw' tells whether
the array actually goes across the Greenwich. E.g: The longitudes in
the data files are defined to be from 0 to 360. The requested
longitudes are from -80 to 40. The original order is kept, hence the
longitudes in the array will be ordered as follows:
0, ..., 40, 280, ..., 360. In that case, 'data_across_gw' will be TRUE
and 'array_across_gw' will be FALSE.\cr
The attribute 'projection' is kept for compatibility with 'downscaleR'.
}
\item{'Variable' has the following components:
\itemize{
\item{'varName', with the short name of the loaded variable as
specified in the parameter 'var'.
}
\item{'level', with information on the pressure level of the
variable. Is kept to NULL by now.
}
}
And the following attributes:
\itemize{
\item{'is_standard', kept for compatibility with 'downscaleR',
tells if a dataset has been homogenized to standards with
'downscaleR' catalogs.
}
\item{'units', a character string with the units of measure of the
variable, as found in the source files.
}
\item{'longname', a character string with the long name of the
variable, as found in the source files.
}
\item{'daily_agg_cellfun', 'monthly_agg_cellfun',
'verification_time', kept for compatibility with 'downscaleR'.
}
}
}
\item{'Datasets' has the following components:
\itemize{
\item{'exp', a named list where the names are the identifying
character strings of each experiment in 'exp', each associated to
a list with the following components:
\itemize{
\item{'members', a list with the names of the members of the dataset.}
\item{'source', a path or URL to the source of the dataset.}
}
}
\item{'obs', similar to 'exp' but for observational datasets.}
}
}
\item{'Dates', with the follwing components:
\itemize{
\item{'start', an array of dimensions (sdate, time) with the POSIX
initial date of each forecast time of each starting date.
}
\item{'end', an array of dimensions (sdate, time) with the POSIX
final date of each forecast time of each starting date.
}
}
}
\item{'InitializationDates', a vector of starting dates as specified in
'sdates', in POSIX format.
}
\item{'when', a time stamp of the date the \code{Load()} call to obtain
the data was issued.
}
\item{'source_files', a vector of character strings with complete paths
to all the found files involved in the \code{Load()} call.
}
\item{'not_found_files', a vector of character strings with complete
paths to not found files involved in the \code{Load()} call.
}
}
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
\description{
This function loads monthly or daily data from a set of specified
experimental datasets together with data that date-corresponds from a set
of specified observational datasets. See parameters 'storefreq',
'sampleperiod', 'exp' and 'obs'.\cr\cr
A set of starting dates is specified through the parameter 'sdates'. Data of
each starting date is loaded for each model.
\code{Load()} arranges the data in two arrays with a similar format both
with the following dimensions:
\enumerate{
\item{The number of experimental datasets determined by the user through
the argument 'exp' (for the experimental data array) or the number of
observational datasets available for validation (for the observational
array) determined as well by the user through the argument 'obs'.}
\item{The greatest number of members across all experiments (in the
experimental data array) or across all observational datasets (in the
observational data array).}
\item{The number of starting dates determined by the user through the
'sdates' argument.}
\item{The greatest number of lead-times.}
\item{The number of latitudes of the selected zone.}
\item{The number of longitudes of the selected zone.}
}
Dimensions 5 and 6 are optional and their presence depends on the type of
the specified variable (global mean or 2-dimensional) and on the selected
output type (area averaged time series, latitude averaged time series,
longitude averaged time series or 2-dimensional time series).\cr
In the case of loading an area average the dimensions of the arrays will be
only the first 4.\cr\cr
Only a specified variable is loaded from each experiment at each starting
date. See parameter 'var'.\cr
Afterwards, observational data that matches every starting date and lead-time
of every experimental dataset is fetched in the file system (so, if two
predictions at two different start dates overlap, some observational values
will be loaded and kept in memory more than once).\cr
If no data is found in the file system for an experimental or observational
array point it is filled with an NA value.\cr\cr
If the specified output is 2-dimensional or latitude- or longitude-averaged
time series all the data is interpolated into a common grid. If the
specified output type is area averaged time series the data is averaged on
the individual grid of each dataset but can also be averaged after
interpolating into a common grid. See parameters 'grid' and 'method'.\cr
Once the two arrays are filled by calling this function, other functions in
the s2dverification package that receive as inputs data formatted in this
data structure can be executed (e.g: \code{Clim()} to compute climatologies,
\code{Ano()} to compute anomalies, ...).\cr\cr
Load() has many additional parameters to disable values and trim dimensions
of selected variable, even masks can be applied to 2-dimensional variables.
See parameters 'nmember', 'nmemberobs', 'nleadtime', 'leadtimemin',
'leadtimemax', 'sampleperiod', 'lonmin', 'lonmax', 'latmin', 'latmax',
'maskmod', 'maskobs', 'varmin', 'varmax'.\cr\cr
The parameters 'exp' and 'obs' can take various forms. The most direct form
is a list of lists, where each sub-list has the component 'path' associated
to a character string with a pattern of the path to the files of a dataset
to be loaded. These patterns can contain wildcards and tags that will be
replaced automatically by \code{Load()} with the specified starting dates,
member numbers, variable name, etc.\cr
See parameter 'exp' or 'obs' for details.\cr\cr
Only NetCDF files are supported. OPeNDAP URLs to NetCDF files are also
supported.\cr
\code{Load()} can load 2-dimensional or global mean variables in any of the
following formats:
\itemize{
\item{experiments:
\itemize{
\item{file per ensemble per starting date
(YYYY, MM and DD somewhere in the path)}
\item{file per member per starting date
(YYYY, MM, DD and MemberNumber somewhere in the path. Ensemble
experiments with different numbers of members can be loaded in
a single \code{Load()} call.)}
}
(YYYY, MM and DD specify the starting dates of the predictions)
}
\item{observations:
\itemize{
\item{file per ensemble per month
(YYYY and MM somewhere in the path)}
\item{file per member per month
(YYYY, MM and MemberNumber somewhere in the path, obs with different
numbers of members supported)}
\item{file per dataset (No constraints in the path but the time axes
in the file have to be properly defined)}
}
(YYYY and MM correspond to the actual month data in the file)
}
}
In all the formats the data can be stored in a daily or monthly frequency,
or a multiple of these (see parameters 'storefreq' and 'sampleperiod').\cr
All the data files must contain the target variable defined over time and
potentially over members, latitude and longitude dimensions in any order,
time being the record dimension.\cr
In the case of a two-dimensional variable, the variables longitude and
latitude must be defined inside the data file too and must have the same
names as the dimension for longitudes and latitudes respectively.\cr
The names of these dimensions (and longitude and latitude variables) and the
name for the members dimension are expected to be 'longitude', 'latitude'
and 'ensemble' respectively. However, these names can be adjusted with the
parameter 'dimnames' or can be configured in the configuration file (read
below in parameters 'exp', 'obs' or see \code{?ConfigFileOpen}
for more information.\cr
All the data files are expected to have numeric values representable with
32 bits. Be aware when choosing the fill values or infinite values in the
datasets to load.\cr\cr
The Load() function returns a named list following a structure similar to
the used in the package 'downscaleR'.\cr
The components are the following:
\itemize{
\item{'mod' is the array that contains the experimental data. It has the
attribute 'dimensions' associated to a vector of strings with the labels
of each dimension of the array, in order.}
\item{'obs' is the array that contains the observational data. It has
the attribute 'dimensions' associated to a vector of strings with the
labels of each dimension of the array, in order.}
\item{'obs' is the array that contains the observational data.}
\item{'lat' and 'lon' are the latitudes and longitudes of the grid into
which the data is interpolated (0 if the loaded variable is a global
mean or the output is an area average).\cr
Both have the attribute 'cdo_grid_des' associated with a character
string with the name of the common grid of the data, following the CDO
naming conventions for grids.\cr
The attribute 'projection' is kept for compatibility with 'downscaleR'.
}
\item{'Variable' has the following components:
\itemize{
\item{'varName', with the short name of the loaded variable as
specified in the parameter 'var'.}
\item{'level', with information on the pressure level of the variable.
Is kept to NULL by now.}
}
And the following attributes:
\itemize{
\item{'is_standard', kept for compatibility with 'downscaleR',
tells if a dataset has been homogenized to standards with
'downscaleR' catalogs.}
\item{'units', a character string with the units of measure of the
variable, as found in the source files.}
\item{'longname', a character string with the long name of the
variable, as found in the source files.}
\item{'daily_agg_cellfun', 'monthly_agg_cellfun', 'verification_time',
kept for compatibility with 'downscaleR'.}
}
}
\item{'Datasets' has the following components:
\itemize{
\item{'exp', a named list where the names are the identifying
character strings of each experiment in 'exp', each associated to a
list with the following components:
\itemize{
\item{'members', a list with the names of the members of the
dataset.}
\item{'source', a path or URL to the source of the dataset.}
}
}
\item{'obs', similar to 'exp' but for observational datasets.}
}
}
\item{'Dates', with the follwing components:
\itemize{
\item{'start', an array of dimensions (sdate, time) with the POSIX
initial date of each forecast time of each starting date.}
\item{'end', an array of dimensions (sdate, time) with the POSIX
final date of each forecast time of each starting date.}
}
}
\item{'InitializationDates', a vector of starting dates as specified in
'sdates', in POSIX format.}
\item{'when', a time stamp of the date the \code{Load()} call to obtain
the data was issued.}
\item{'source_files', a vector of character strings with complete paths
to all the found files involved in the \code{Load()} call.}
\item{'not_found_files', a vector of character strings with complete
paths to not found files involved in the \code{Load()} call.}
}
\details{
The two output matrices have between 2 and 6 dimensions:\cr
\enumerate{
\item{Number of experimental/observational datasets.}
\item{Number of members.}
\item{Number of startdates.}
\item{Number of leadtimes.}
\item{Number of latitudes (optional).}
\item{Number of longitudes (optional).}
}
but the two matrices have the same number of dimensions and only the first
two dimensions can have different lengths depending on the input arguments.
For a detailed explanation of the process, read the documentation attached
to the package or check the comments in the code.
}
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
# Let's assume we want to perform verification with data of a variable
# called 'tos' from a model called 'model' and observed data coming from
# an observational dataset called 'observation'.
#
# The model was run in the context of an experiment named 'experiment'.
# It simulated from 1st November in 1985, 1990, 1995, 2000 and 2005 for a
# period of 5 years time from each starting date. 5 different sets of
# initial conditions were used so an ensemble of 5 members was generated
# for each starting date.
# The model generated values for the variables 'tos' and 'tas' in a
# 3-hourly frequency but, after some initial post-processing, it was
# averaged over every month.
# The resulting monthly average series were stored in a file for each
# starting date for each variable with the data of the 5 ensemble members.
# The resulting directory tree was the following:
# model
# |--> experiment
# |--> monthly_mean
# |--> tos_3hourly
# | |--> tos_19851101.nc
# | |--> tos_19901101.nc
# | .
# | .
# | |--> tos_20051101.nc
# |--> tas_3hourly
# |--> tas_19851101.nc
# |--> tas_19901101.nc
# .
# .
# |--> tas_20051101.nc
#
# The observation recorded values of 'tos' and 'tas' at each day of the
# month over that period but was also averaged over months and stored in
# a file per month. The directory tree was the following:
# observation
# |--> monthly_mean
# |--> tos
# | |--> tos_198511.nc
# | |--> tos_198512.nc
# | |--> tos_198601.nc
# | .
# | .
# | |--> tos_201010.nc
# |--> tas
# |--> tas_198511.nc
# |--> tas_198512.nc
# |--> tas_198601.nc
# .
# .
# |--> tas_201010.nc
#
# The model data is stored in a file-per-startdate fashion and the
# observational data is stored in a file-per-month, and both are stored in
# a monthly frequency. The file format is NetCDF.
# Hence all the data is supported by Load() (see details and other supported
# conventions in ?Load) but first we need to configure it properly.
#
# These data files are included in the package (in the 'sample_data' folder),
# only for the variable 'tos'. They have been interpolated to a very low
# resolution grid so as to make it on CRAN.
# The original grid names (following CDO conventions) for experimental and
# observational data were 't106grid' and 'r180x89' respectively. The final
# resolutions are 'r20x10' and 'r16x8' respectively.
# The experimental data comes from the decadal climate prediction experiment
# run at IC3 in the context of the CMIP5 project. Its name within IC3 local
# database is 'i00k'.
# The observational dataset used for verification is the 'ERSST'
# observational dataset.
#
# The next two examples are equivalent and show how to load the variable
# 'tos' from these sample datasets, the first providing lists of lists to
# the parameters 'exp' and 'obs' (see documentation on these parameters) and
# the second providing vectors of character strings, hence using a
# configuration file.
#
# The code is not run because it dispatches system calls to 'cdo' which is
# not allowed in the examples as per CRAN policies. You can run it on your
# system though.
# Instead, the code in 'dontshow' is run, which loads the equivalent
# already processed data in R.
#
# Example 1: Providing lists of lists to 'exp' and 'obs':
#
data_path <- system.file('sample_data', package = 's2dverification')
exp <- list(
name = 'experiment',
path = file.path(data_path, 'model/$EXP_NAME$/monthly_mean',
'$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATES$.nc')
)
name = 'observation',
path = file.path(data_path, 'observation/$OBS_NAME$/monthly_mean',
'$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc')
)
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', list(exp), list(obs), startDates,
output = 'areave', latmin = 27, latmax = 48,
lonmin = -12, lonmax = 40)
}
#
# Example 2: Providing vectors of character strings to 'exp' and 'obs'
# and using a configuration file.
#
# The configuration file 'sample.conf' that we will create in the example
# has the proper entries to load these (see ?LoadConfigFile for details on
# writing a configuration file).
Nicolau Manubens
committed
data_path <- system.file('sample_data', package = 's2dverification')
expA <- list(name = 'experiment', path = file.path(data_path,
'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly',
'$VAR_NAME$_$START_DATE$.nc'))
Nicolau Manubens
committed
obsX <- list(name = 'observation', path = file.path(data_path,
'$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$',
'$VAR_NAME$_$YEAR$$MONTH$.nc'))
Nicolau Manubens
committed
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', list(expA), list(obsX), startDates,
output = 'areave', latmin = 27, latmax = 48,
lonmin = -12, lonmax = 40)
Nicolau Manubens
committed
#
# Example 2: providing character strings in 'exp' and 'obs', and providing
# a configuration file.
# The configuration file 'sample.conf' that we will create in the example
# has the proper entries to load these (see ?LoadConfigFile for details on
# writing a configuration file).
#
configfile <- paste0(tempdir(), '/sample.conf')
ConfigFileCreate(configfile, confirm = FALSE)
c <- ConfigFileOpen(configfile)
c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MIN', '-1e19', confirm = FALSE)
c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MAX', '1e19', confirm = FALSE)
data_path <- system.file('sample_data', package = 's2dverification')
exp_data_path <- paste0(data_path, '/model/$EXP_NAME$/')
obs_data_path <- paste0(data_path, '/$OBS_NAME$/')
c <- ConfigAddEntry(c, 'experiments', dataset_name = 'experiment',
var_name = 'tos', main_path = exp_data_path,
file_path = '$STORE_FREQ$_mean/$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATE$.nc')
c <- ConfigAddEntry(c, 'observations', dataset_name = 'observation',
var_name = 'tos', main_path = obs_data_path,
file_path = '$STORE_FREQ$_mean/$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc')
ConfigFileSave(c, configfile, confirm = FALSE)
# Now we are ready to use Load().
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- Load('tos', c('experiment'), c('observation'), startDates,
output = 'areave', latmin = 27, latmax = 48,
lonmin = -12, lonmax = 40, configfile = configfile)
}
\dontshow{
startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101')
sampleData <- s2dverification:::.LoadSampleData('tos', c('experiment'),
c('observation'), startDates,
output = 'areave',
latmin = 27, latmax = 48,
lonmin = -12, lonmax = 40)
}
}
\author{
History:\cr
0.1 - 2011-03 (V. Guemas) - Original code\cr
1.0 - 2013-09 (N. Manubens) - Formatting to CRAN\cr
1.2 - 2015-02 (N. Manubens) - Generalisation + parallelisation\cr
1.3 - 2015-07 (N. Manubens) - Improvements related to configuration file mechanism\cr
1.4 - 2016-01 (N. Manubens) - Added subsetting capabilities