This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
tools:eionet-utdretriever [2016/01/08 15:36] jcuadrad [Usage] |
tools:eionet-utdretriever [2016/05/05 14:18] jcuadrad [Description] |
||
---|---|---|---|
Line 5: | Line 5: | ||
This tool provides automated data collection from the [[http:// | This tool provides automated data collection from the [[http:// | ||
- | The aim is to further improve the gathering of air quality observations across Europe by using the [[http:// | + | The aim is to further improve the gathering of air quality observations across Europe by using the [[http:// |
==== Description ==== | ==== Description ==== | ||
Line 11: | Line 11: | ||
The functionality provided is a retriever of (near real time) air quality observations from stations adhered to EIONET network. | The functionality provided is a retriever of (near real time) air quality observations from stations adhered to EIONET network. | ||
- | The tool is able to connect to EIONET servers, download the required data, parse it, check for validity of observations and store them in the Air Quality Forecast Evaluation (eval_new) database.\\ This tool also inserts new stations in the " | + | The tool is able to connect to EIONET servers, download the required data, parse it, check for validity of observations and store them in the Air Quality Forecast Evaluation (eval_new) database. |
The validity checks performed on the retrieved data are the ones described in this PDF: | The validity checks performed on the retrieved data are the ones described in this PDF: | ||
Line 20: | Line 20: | ||
* Flag DS is applied if observation is constant for the past five hours (-5h..-1h range) and above a value of 10 for its pollutant. | * Flag DS is applied if observation is constant for the past five hours (-5h..-1h range) and above a value of 10 for its pollutant. | ||
* Flag MP is applied if previous observation (past hour) is an outlier (has maximum slide change) for its pollutant. | * Flag MP is applied if previous observation (past hour) is an outlier (has maximum slide change) for its pollutant. | ||
- | Any observation flagged is not considered by the different evaluations and post processes in the CALIOPE Forecast. | + | Any observation flagged is not considered by the different evaluations and post processes in the CALIOPE Forecast |
The output of the process is stored into the table " | The output of the process is stored into the table " | ||
Line 43: | Line 43: | ||
The EIONET (up-to-date) air quality data is available for the following geographic areas/ | The EIONET (up-to-date) air quality data is available for the following geographic areas/ | ||
^ Code ^ Name ^ Code ^ Name ^ | ^ Code ^ Name ^ Code ^ Name ^ | ||
+ | | AD| Andorra | | | | ||
| AT| Austria | IE| Ireland | | | AT| Austria | IE| Ireland | | ||
| BE| Belgium | LT| Lithuania | | | BE| Belgium | LT| Lithuania | | ||
Line 53: | Line 54: | ||
| GI| Gibraltar | PT| Portugal | | | GI| Gibraltar | PT| Portugal | | ||
| HR| Croatia | SE| Sweden | | | HR| Croatia | SE| Sweden | | ||
- | | HU| Hungary | | + | | HU| Hungary |
- | To see the **current status on data delivery**, which countries are delivering data and what they deliver, please see this [[http:// | + | To see the **current status on data delivery**, which countries are delivering data and what they deliver, please see this **[[https:// |
- | Please refer to the " | + | For further information about stations definitions used by EIONET please refer to: [[tools: |
+ | |||
+ | Please note that EIONET reports the different network_timezone for each station in which the reported observations are. This tool automatically translates all observations to UTC. All observations stored in the eval_new database are in UTC. \\ | ||
+ | ** Since 20/apr/2016 EIONET is reporting the metadata for the time zone of observations in Spain, Lithuania, Macedonia and Slovenia (that was previously missing). Therefore, all observations since 20/apr/2016 should be as correct as the information provided by the Member States to EIONET. To date, 05/may/2016 we are still troubleshooting (in contact with Generalitat de Catalunya and EIONET) a discrepancy of 1h of difference in observations for Catalunya. ** \\ | ||
+ | Until 20/apr/2016 it was assumed that observations were in UTC, when no metadata was provided. | ||
+ | |||
+ | Please refer to the " | ||
==== Requirements ==== | ==== Requirements ==== | ||
Line 67: | Line 74: | ||
==== Usage ==== | ==== Usage ==== | ||
- | This tool is designed to be run by a cron (a time-based job scheduler) job.\\ | + | This tool is designed to be run by a cron (a time-based job scheduler) job. However, it can also be used as a once-time-execution command-line application. Please see below for command-line usage. \\ |
- | It is recommended to call the process daily every 4 hours in order to avoid EIONET saturation. The process can be called at any time but it is discouraged to run this tool at frequencies below 1 hour because the service will not have any new observations to serve. It is also discouraged to run this tool at periods greater than 6 hours if near real time evaluation of the air quality forecast is wanted. | + | It is recommended to call the process daily every 8 hours in order to avoid EIONET saturation. The process can be called at any time but it is discouraged to run this tool at frequencies below 2 hours because the service will not have any new observations to serve. It is also discouraged to run this tool at periods greater than 24 hours if near real time evaluation of the air quality forecast is wanted. |
+ | |||
+ | == Command-line usage/ | ||
+ | |||
+ | The calls to this tool are as follows, using the common format EIONETretriever.py [command] configs [options]: | ||
+ | < | ||
+ | Is the standard usage. The tool will look for the " | ||
+ | Conversely, if we want to download observations for France and Italy the following syntax is needed (assuming the corresponding FRIT.conf file is under config directory): | ||
+ | < | ||
+ | |||
+ | The two available commands are: | ||
+ | * download: Normal operation, automated data download data uses the UpdatedSince and CreatedSince filters. In this mode, the retriever will keep track of the dates of last successful download for each pollutant and country, storing it in the DOWNLOAD_DATE table in the database. | ||
+ | * download_no_filters: | ||
+ | * download_sliding_no_filters: | ||
+ | |||
+ | In the " | ||
+ | < | ||
+ | In this example, all observations of the countries (de,at,pl) and pollutants defined in the " | ||
+ | |||
+ | Example for " | ||
+ | If today is 2016-01-27 and we want to download the observations of days 2016-01-12 and 2016-01-13 we can get them in two ways: | ||
+ | < | ||
+ | python3 EIONETretriever.py download_sliding_no_filters ES --fromDaysAgo 15 --toDaysAgo 13 > logs/ | ||
+ | Please note that in this mode, if the option --toDaysAgo is not provided the download will be until to date (now). \\ | ||
+ | |||
+ | As usual, the --help (or -h) option will also display the command-line manual/ | ||
+ | Please note that this article always refers to the ' | ||
== Configuration file == | == Configuration file == | ||
- | The amount of data to be downloaded can be easily configured in this tool by modifying the "EIONETretriever.conf" file in the source directory. The available options are: | + | The amount of data to be downloaded can be easily configured in this tool by modifying the corresponding |
* countrycodes: | * countrycodes: | ||
Line 85: | Line 119: | ||
== Time span of observations retrieved == | == Time span of observations retrieved == | ||
The tool requests observations in time span of ' | The tool requests observations in time span of ' | ||
- | Please note that when FromDate is used, then ToDate is mandatory to obtain a response from the service. Due to the different upload and update patterns of the data providers it is needed to request at least a couple of days of observations for the FromDate field, and at the same time, use the UpdatedSinceDate and InsertedSinceDate filters (see " | + | Please note that when FromDate is used, then ToDate is mandatory to obtain a response from the service. |
- | The recommended value for ' | + | Due to the different upload and update patterns of the data providers it is needed to request at least a couple of days of observations for the FromDate field, and at the same time, use the UpdatedSinceDate and InsertedSinceDate filters (see " |
+ | The recommended value for ' | ||
== Filters on data download == | == Filters on data download == | ||
Line 99: | Line 134: | ||
== Further documentation == | == Further documentation == | ||
- | Internal request fields (to the EIONET service) used in this tool can be found here: http:// | + | * Slides presenting the tool, including information about the number of stations, the rationale behind the different config files and for the chosen update patterns used in the cron, etc. {{tools: |
+ | * **Internal request fields (to the EIONET service) used in this tool** can be found here: http:// | ||
+ | * More technical documentation (Sphinx-generated) can be found under the directory docs in the repository. | ||