README.md 4.18 KB
Newer Older
# C3S-512 CDS Data Checker
Joan Sala Calero's avatar
Joan Sala Calero committed

The main function of this Gitlab Project is to join all the efforts done in the data evaluation of the **C**limate **D**ata **S**tore (**CDS**).<br></br>
Joan Sala Calero's avatar
Joan Sala Calero committed

## Install & Run

```bash
Joan Sala Calero's avatar
Joan Sala Calero committed
conda create -y -n dqc python=3
conda activate dqc
Joan Sala Calero's avatar
Joan Sala Calero committed
git clone https://earth.bsc.es/gitlab/external/c3s512-wp1-datachecker.git
cd c3s512-wp1-datachecker
pip install -r requirements.txt
cd dqc_chekcer

python checker.py <config_file>
Joan Sala Calero's avatar
Joan Sala Calero committed

```
**Note**: In the following section you will find information on how to write your own **config_file**.

## Configure

```bash
- In order to run the checker you must write a simple config (ConfigParser ini format)
Joan Sala Calero's avatar
Joan Sala Calero committed
- There is a general section where general dataset and path options are specified
- Each config section represents a check/test (ex: file_format or temporal_completeness)
Joan Sala Calero's avatar
Joan Sala Calero committed
- Each config section might have specific parameters related to the specific check (see example below)
Joan Sala Calero's avatar
Joan Sala Calero committed

Joan Sala Calero's avatar
Joan Sala Calero committed
```
Joan Sala Calero's avatar
Joan Sala Calero committed
**Note 1**: Config examples for **ALL** available checks can be found in the **dqc_wrapper/conf** folder.<br></br>
Joan Sala Calero's avatar
Joan Sala Calero committed
**Note 2**: The following config checks for temporal consistency. Multiple checks can be stacked in one file.
Joan Sala Calero's avatar
Joan Sala Calero committed

````
[general]
Joan Sala Calero's avatar
Joan Sala Calero committed
input = /shared/cds_downloads/seasonal/seasonal-original-single-levels/2m_temperature
fpattern = ecmwf-5-*.grib
log_dir = /tmp/dqc_logs
res_dir = /tmp/dqc_res
Joan Sala Calero's avatar
Joan Sala Calero committed
type = grib
Joan Sala Calero's avatar
Joan Sala Calero committed

[temporal_completeness]
Joan Sala Calero's avatar
Joan Sala Calero committed
forms_dir = /data/cds-forms-c3s
dataset = seasonal-original-single-levels
variable = 2m_temperature
origin = ecmwf
system = 5
````

## Config options (detailed)

The **config** is defined in the .ini format compatible with the python ConfigParser package.<br></br>
Each section represents an independent data **check**. The following example is for **ALL** available tests:<br></br>
Joan Sala Calero's avatar
Joan Sala Calero committed

````
**[general]:**
# Directory or file to be checked.
input = /path/to/files
# If a directory is provided the pattern can be used to filter the files. Can be empty, then every file is taken
pattern = ecmwf-5*.grib 
# Directory where DQC logs are stored
log_dir = 
# Directory where DQC test results are stored (will be created if it does not exist)
res_dir = /tmp/dqc_res
# Type of files
type = grib or CF
Joan Sala Calero's avatar
Joan Sala Calero committed

[file_format]:
# No parameters required
Joan Sala Calero's avatar
Joan Sala Calero committed

[standard_compliance]:
# No parameters required
Joan Sala Calero's avatar
Joan Sala Calero committed

[spatial_completeness]:
# No parameters required

[temporal_completeness]
# Directory with constrains.json per product (a.k.a c3sforms)
forms_dir = /data/cds-forms-c3s
# Dataset (as available in c3s catalogue form)
dataset = reanalysis-era5-single-levels
# Variable (form variable)
variable = sea_surface_temperature
# Origin (for seasonal products, otherwise optional)
origin =
# System (for seasonal products, otherwise optional)
system =
Joan Sala Calero's avatar
Joan Sala Calero committed

[spatial_consistency]:
# Resolution of the grid (positive value), typically xinc
grid_interval = 0.5
# Type of Grid (gaussian, lonlat, ...)
grid_type = lonlat

[temporal_consistency]:
# Time step, positive integer value
time_step = 1
# Time unit (Hour, Day, Month, Year)
time_granularity = hour

[valid_ranges]
# Variable to focus the analysis (typically shortname, see grib_ls)
variable = sst
# Type of data (used for GRIB filtering) -> can be ''
datatype = em
# In case the valid minimum for the data is known (Otherwise, thresholds are set statistically)
valid_min =
# In case the valid maximum for the data is known (Otherwise, thresholds are set statistically)
valid_max =
Joan Sala Calero's avatar
Joan Sala Calero committed

````
## Result 

Each test run produces a result inside the **res_dir** specified in the **general** section.<br></br>
The result file contains the configuration of the test to keep track and make the tests reproduceable.<br></br>
The section _result contains (ok/err) indicating sucess and a short message and log location.<br></br>
Joan Sala Calero's avatar
Joan Sala Calero committed

````
[spatial_consistency]
grid_interval = 0.25
grid_type = lonlat

[spatial_consistency_result]
res = ok
msg = Files are spatially consistent
log = /tmp/dqc_logs/LOG_conf_test04_1.ini_20191113_.txt
Joan Sala Calero's avatar
Joan Sala Calero committed
````
Joan Sala Calero's avatar
Joan Sala Calero committed

## Recent updates

You can find an updated LOG to track new major modifications here:<br>
* [UPDATE LOG](UPDATE_LOG.md) 
Joan Sala Calero's avatar
Joan Sala Calero committed

## Other information
Joan Sala Calero's avatar
Joan Sala Calero committed

* [Summary of Avalable Data Checkers](01_summary_data_checkers.md)
* [Fist dataset inventory of the CDS](02_cds_inventory.md)
* [First CF check LOG using existing cfchecker for NetCDF files](CF_checker_log/)
Joan Sala Calero's avatar
Joan Sala Calero committed