This is an old revision of the document!
EIONET UTD (up-to-date) Air Quality data retrieval
This tool provides automated data collection from the EEA's Air Quality Portal.
The aim is to further improve the gathering of air quality observations across Europe by using the LIVE Air Quality Data service.
The purpose of this tool is also to provide a single, common tool to gather air quality observations from stations, thus phasing out the several scripts used to date -as of December 2015- (one for each Comunidad Autónoma, Ayuntamiento) in the CALIOPE forecast evaluation.
The functionality provided is a retriever of (near real time) air quality observations from stations adhered to EIONET network.
The tool is able to connect to EIONET servers, download the required data, parse it, check for validity of observations and store them in the Air Quality Forecast Evaluation (eval_new) database.
This tool also inserts new stations in the “STATIONS” table of the DB when a new one is found. The fields automatically inserted are Station code, station name, lat, lon, and heigth above sea level.
The validity checks performed on the retrieved data are the ones described in this PDF:
Filtrado de observaciones CALIOPE
This is the summary of the quality control performed over the data downloaded:
Any observation flagged is not considered by the different evaluations and post processes in the CALIOPE Forecast.
The output of the process is stored into the table “OBS_AQ” of the MySQL database (eval_new) with the identifier DOMAINS_id=6, as this is the Domain for the values coming from Station observations.
As of 22 December 2015, this tool is configured to provide observations of stations in Spain for the following pollutants:
CALIOPE acronym | EIONET notation |
---|---|
O3 | O3 |
NO2 | NO2 |
SO2 | SO2 |
PM10T | PM10 |
PM2.5T | PM2.5 |
Please refer to the “Usage” section to see details about pollutant aconyms/notations and time spans of observations.
This tool can provide observations for any pollutant and country available.
The full list of pollutants available at the EIONET download service can be found here: Pollutants.csv (originalsource)
The EIONET (up-to-date) air quality data is available for the following geographic areas/countries:
Code | Name | Code | Name |
---|---|---|---|
AT | Austria | IE | Ireland |
BE | Belgium | LT | Lithuania |
DE | Germany | LU | Luxembourg |
DK | Denmark | MK | Macedonia |
ES | Spain | MT | Malta |
FI | Finland | NL | Netherlands |
FR | France | NO | Norway |
GB | United Kingdom | PL | Poland |
GI | Gibraltar | PT | Portugal |
HR | Croatia | SE | Sweden |
HU | Hungary |
To see the current status on data delivery, which countries are delivering data and what they deliver, please see this live report.
This tool requires Python interpreter 3.4.
In the source code it is required to have packages “requests” (at least v.2.8.1) and “pymysql” (at least v.0.6.7) installed. One can get them by calling: pip3 install –user requests pymysql.
This tool is designed to be run by a cron (a time-based job scheduler) job.
It is recommended to call the process daily every 4 hours in order to avoid EIONET saturation. The process can be called at any time but it is discouraged to run this tool at frequencies below 1 hour because the service will not have any new observations to serve. It is also discouraged to run this tool at periods greater than 6 hours if near real time evaluation of the air quality forecast is wanted.
All configurations are automatically handled and the purpose of the following descriptions is just to document the functionality.
The observations of the pollutants to be retrieved are defined in the 'pollutants' dictionary. Please note that the CALIOPE and EIONET notations must be provided. Please refer to CSV in the “Data coverage” section to see all EIONET notations.
The tool requests observations in time span of 'days_of_obs' to the current date and time (now).
Please note that when FromDate is used, then ToDate is mandatory to obtain a response from the service. Due to the different upload and update patterns of the data providers it is needed to request at least a couple of days of observations for the FromDate field, and at the same time, use the UpdatedSinceDate and InsertedSinceDate filters (see “Filters on data download” below) to avoid downloading too much duplicate data and avoid hitting the maximum records per request of the EIONET service (set at 50k records).
The recommended value for 'days_of_obs' is 4 to allow some data providers to recover and upload observations from malfucntions at stations. This value can be safely increased when the filters on data download are used.
There are two filters in the tool ('UPDATE_FILTER', 'INSERTED_FILTER') available to be used to request new data since last download. Due to unreliable update patterns of some data providers, this tool first launches the request using the UpdatedSinceDate filter, and then, launches the request to EIONET service with the InsertedSinceDate filter. After some data curation and testing it has been found some data providers allocate space for observations (InsertedSinceDate filter) and few hours or days later then update the value (UpdatedSinceDate filter).
As a summary, the time span of observations that this tool requests are:
In case of duplicate observations due to time span of the data downloaded this tool will prevent inserting duplicates (date, Station_id, Pollutant_id are unique).
Internal request fields (to the EIONET service) used in this tool can be found here: http://discomap.eea.europa.eu/map/fme/doc/UTDAirQualityDownloadGuide.pdf
The link to the GIT repository is:
https://earth.bsc.es/gitlab/jcuadrad/EIONET.git
The developer of this tool is Jordi Cuadrado Borbonés jordi.cuadrado@bsc.es under guidance of Kim Serradell kim.serradell@bsc.es.
This tool is coded in Python3.
You can check the general style guide for Python development here