Newer
Older
# C3S-512 CDS Data Checker
The main function of this Gitlab Project is to join all the efforts done in the data evaluation of the **C**limate **D**ata **S**tore (**CDS**).<br></br>
git clone https://earth.bsc.es/gitlab/external/c3s512-wp1-datachecker.git
cd c3s512-wp1-datachecker
pip install -r requirements.txt
cd dqc_chekcer
python checker.py <config_file>
```
**Note**: In the following section you will find information on how to write your own **config_file**.
## Configure
```bash
- In order to run the checker you must write a simple config (ConfigParser ini format)
- There is a general section where general dataset and path options are specified
- Each config section represents a check/test (ex: file_format or temporal_completeness)
- Each config section might have specific parameters related to the specific check (see example below)
**Note 1**: Config examples for **ALL** available checks can be found in the **dqc_wrapper/conf** folder.<br></br>
**Note 2**: The following config checks for temporal consistency. Multiple checks can be stacked in one file.
input = /shared/cds_downloads/seasonal/seasonal-original-single-levels/2m_temperature
fpattern = ecmwf-5-*.grib
log_dir = /tmp/dqc_logs
forms_dir = /data/cds-forms-c3s
dataset = seasonal-original-single-levels
variable = 2m_temperature
origin = ecmwf
system = 5
````
## Config options (detailed)
The **config** is defined in the .ini format compatible with the python ConfigParser package.<br></br>
Each section represents an independent data **check**. The following example is for **ALL** available tests:<br></br>
# Directory or file to be checked.
input = /path/to/files
# If a directory is provided the pattern can be used to filter the files. Can be empty, then every file is taken
pattern = ecmwf-5*.grib
# Directory where DQC logs are stored
log_dir =
# Directory where DQC test results are stored (will be created if it does not exist)
res_dir = /tmp/dqc_res
# Type of files
type = grib or CF
# No parameters required
[temporal_completeness]
# Directory with constrains.json per product (a.k.a c3sforms)
forms_dir = /data/cds-forms-c3s
# Dataset (as available in c3s catalogue form)
dataset = reanalysis-era5-single-levels
# Variable (form variable)
variable = sea_surface_temperature
# Origin (for seasonal products, otherwise optional)
origin =
# System (for seasonal products, otherwise optional)
system =
# Resolution of the grid (positive value), typically xinc
grid_interval = 0.5
# Type of Grid (gaussian, lonlat, ...)
grid_type = lonlat
[temporal_consistency]:
# Time step, positive integer value
time_step = 1
# Time unit (Hour, Day, Month, Year)
time_granularity = hour
[valid_ranges]
# Variable to focus the analysis (typically shortname, see grib_ls)
variable = sst
# Type of data (used for GRIB filtering) -> can be ''
datatype = em
# In case the valid minimum for the data is known (Otherwise, thresholds are set statistically)
valid_min =
# In case the valid maximum for the data is known (Otherwise, thresholds are set statistically)
valid_max =
````
## Result
Each test run produces a result inside the **res_dir** specified in the **general** section.<br></br>
The result file contains the configuration of the test to keep track and make the tests reproduceable.<br></br>
The section _result contains (ok/err) indicating sucess and a short message and log location.<br></br>
````
[spatial_consistency]
grid_interval = 0.25
grid_type = lonlat
[spatial_consistency_result]
res = ok
msg = Files are spatially consistent
log = /tmp/dqc_logs/LOG_conf_test04_1.ini_20191113_.txt
## Recent updates
You can find an updated LOG to track new major modifications here:<br>
* [UPDATE LOG](UPDATE_LOG.md)
* [Summary of Avalable Data Checkers](01_summary_data_checkers.md)
* [Fist dataset inventory of the CDS](02_cds_inventory.md)
* [First CF check LOG using existing cfchecker for NetCDF files](CF_checker_log/)
<br><br>