Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# C3S-512 CDS Data Checker
The main function of this Gitlab Project is to join all the efforts done in the data evaluation of the **C**limate **D**ata **S**tore (**CDS**).<br/>
The following software is designed to work both with **GRIB** and **NetCDF** files and perform the following data checks:<br/>
* Standard compliance and file format
* Spatial/Temporal completeness and consistency
* Observed/Plausible data ranges
* GRIB to NetCDF experimental C3S conversion (checks for CF compliance)
## Dependencies and libraries
In order to run, the data checker requires binaries and libraries:
```bash
eccodes - version 2.19.0
https://confluence.ecmwf.int/display/ECC/ecCodes+installation
cdo - version 1.9.8
https://code.mpimet.mpg.de/projects/cdo/files
ecflow - version 5.4.0
https://confluence.ecmwf.int/display/ECFLOW
```
**Note**: This packages might exist in your apt / yum package repository. It is prefered to use a conda environment <br/>
**Please use the versions specified**
## Install & Run
When running on the C3S_512 VM, use /data/miniconda3-data to avoid installing anything in /home which has limited space. The TMPDIR should also be set to somewhere with more space than the default /tmp (export TMPDIR='/data/tmp/envs/')
to create yout own environment you can execute:
```bash
conda create -y python=3 --prefix /data/miniconda3-data/envs/<NAME>
```
Then edit your bashrc and set:
```bash
export TMPDIR='/data/tmp/envs/'
```
If you create a conda environment, you can easily install these dependencies:
```bash
# Create conda virtual environment
conda create -y -n dqc python=3
conda activate dqc
conda install -c conda-forge eccodes=2.19.0
conda install -c conda-forge cdo=1.9.8
conda install -c conda-forge ecflow=5.4.0
# Get code
git clone https://earth.bsc.es/gitlab/ces/c3s512-wp1-datachecker.git
cd c3s512-wp1-datachecker
pip install .
# Install requirements
pip install -r requirements.txt
# Run
cd dqc_checker
python checker.py <config_file>
```
**Note**: In the following section you will find information on how to write your own **config_file**.
## Configure
```bash
- In order to run the checker you must write a simple config (RawConfigParser ini format)
- There is a general section where general path options are specified
- There is a dataset section where dataset dependant information shall be specified
- Each config section represents a check/test (ex: file_format or temporal_completeness)
- Each config section might have specific parameters related to the specific check (see example below)
```
**Note 1**: Config examples for **ALL** available checks can be found in the **dqc_wrapper/conf** folder.<br></br>
**Note 2**: The following config checks for temporal consistency. Multiple checks can be stacked in one file.
````
[general]
input = /data/dqc_test_data/seasonal/seasonal-monthly-single-levels/2m_temperature
fpattern = ecmwf-5_fcmean*.grib
log_dir = /my/log/directory
res_dir = /my/output/directory
forms_dir = /data/cds-forms-c3s
[dataset]
variable = t2m
datatype = fcmean
cds_dataset = seasonal-monthly-single-levels
cds_variable = 2m_temperature
````
## Config options (detailed)
The **config** is defined in the .ini format compatible with the python RawConfigParser package.<br></br>
Each section represents an independent data **check**. The following example is for **ALL** available tests:<br></br>
````
[general]:
# Directory or file to be checked.
input = /data/dqc_test_data/seasonal/seasonal-monthly-single-levels/2m_temperature
# If a directory is provided the pattern can be used to filter the files. Can be empty, then every file is taken
fpattern = ecmwf-5*.grib
# Directory where DQC logs are stored
log_dir = /tmp/dqc_logs
# Directory where DQC test results are stored (will be created if it does not exist)
res_dir = /tmp/dqc_res
# Directory with constraints.json per every cds dataset (a.k.a c3s-cds-forms)
forms_dir = /data/cds-forms-c3s
[dataset]
# Variable to analyze (if grib, see grib_dump command, look for cfVarName) **OPTIONAL**
variable = t2m
# Data type to analyze (if grib, see grib_ls command) **OPTIONAL**
datatype = fcmean
# Dataset (as available in c3s catalogue form)
cds_dataset = seasonal-monthly-single-levels
# Variable (form variable)
cds_variable = 2m_temperature
# Split dates or use grib_filter in order to reduce memory consumption **OPTIONAL**
split_dates = no
[file_format]:
# No parameters required
[standard_compliance]:
# No parameters required
[spatial_completeness]:
# Land/Sea mask if available
mask_file =
# Variable name within the mask grib file (default is lsm)
mask_var =
[temporal_completeness]
# Origin (for seasonal products, otherwise optional)
origin = ecmwf
# System (for seasonal products, otherwise optional)
system = 5
# Flag indicating if dataset is seasonal (monthly, daily)
is_seasonal =
[spatial_consistency]:
# Resolution of the grid (positive value), typically xinc
grid_interval = 1
# Type of Grid (gaussian, lonlat, ...)
grid_type = lonlat
[temporal_consistency]:
# Time step, positive integer value
time_step = 1
# Time unit (Hour, Day, Month, Year) or (h,d,m,y)
time_granularity = month
[valid_ranges]
# In case the valid minimum for the data is known (Otherwise, thresholds are set statistically)
valid_min =
# In case the valid maximum for the data is known (Otherwise, thresholds are set statistically)
valid_max =
[netcdf_converter]
````
## Result
Each test run produces a result inside the **res_dir** specified in the **general** section.<br></br>
The result zip file contains a PDF report for each of the tests launched.<br></br>
The section _result contains (ok/err) indicating sucess and a short message and log location.<br></br>
````
[spatial_consistency]
grid_interval = 0.25
grid_type = lonlat
[spatial_consistency_result]
res = ok
msg = Files are spatially consistent
````
## Recent updates
You can find an updated LOG to track new major modifications here:<br>
* [UPDATE LOG](UPDATE_LOG.md)
<br><br>