User Tools

Site Tools


tools:eionet-utdretriever

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tools:eionet-utdretriever [2016/01/27 11:35]
jcuadrad Added new mode of operation (download_no_filters)
tools:eionet-utdretriever [2016/12/29 09:09] (current)
kserrade [Repository]
Line 5: Line 5:
 This tool provides automated data collection from the [[http://www.eionet.europa.eu/aqportal/datamonitor|EEA's Air Quality Portal]]. This tool provides automated data collection from the [[http://www.eionet.europa.eu/aqportal/datamonitor|EEA's Air Quality Portal]].
  
-The aim is to further improve the gathering of air quality observations across Europe by using the [[http://discomap.eea.europa.eu/map/fme/AirQualityUTDExport.htm|LIVE Air Quality Data service]].\\ The purpose of this tool is also to provide a single, common tool to gather air quality observations from stations, thus phasing out the several scripts used to date -as of December 2015- (one for each Comunidad Autónoma, Ayuntamiento) in the CALIOPE forecast evaluation.+The aim is to further improve the gathering of air quality observations across Europe by using the [[http://discomap.eea.europa.eu/map/fme/AirQualityUTDExport.htm|LIVE Air Quality Data service]].\\ The purpose of this tool is also to provide a single, common tool to gather air quality observations from stations, thus phasing out the several scripts used to date -as of December 2015- (one for each Comunidad Autónoma (regional government), Ayuntamiento (city council)) in the CALIOPE forecast evaluation.
  
 ==== Description ==== ==== Description ====
Line 11: Line 11:
 The functionality provided is a retriever of (near real time) air quality observations from stations adhered to EIONET network. The functionality provided is a retriever of (near real time) air quality observations from stations adhered to EIONET network.
  
-The tool is able to connect to EIONET servers, download the required data, parse it, check for validity of observations and store them in the Air Quality Forecast Evaluation (eval_new) database.\\ This tool also inserts new stations in the "STATIONS" table of the DB when a new one is found. The fields automatically inserted are Station code, station name, lat, lon, and heigth above sea level.+The tool is able to connect to EIONET servers, download the required data, parse it, check for validity of observations and store them in the Air Quality Forecast Evaluation (eval_new) database. **Please note** no CSV or any other output format is provided. All observations are centralized in the database for exploitation through the different systems (like the [[http://www.bsc.es/projects/earthscience/visor/bases_datos/aq/|CALIOPE Visor]]) \\ This tool also inserts new stations in the "STATIONS" table of the DB when a new one is found. The fields automatically inserted are Station code, station name, lat, lon, and height above sea level.
  
 The validity checks performed on the retrieved data are the ones described in this PDF:  The validity checks performed on the retrieved data are the ones described in this PDF: 
Line 43: Line 43:
 The EIONET (up-to-date) air quality data is available for the following geographic areas/countries: The EIONET (up-to-date) air quality data is available for the following geographic areas/countries:
 ^ Code ^ Name ^ Code ^ Name ^ ^ Code ^ Name ^ Code ^ Name ^
-ATAustria | IE| Ireland | +ADAndorra | IE| Ireland | 
-BEBelgium | LT| Lithuania | +ATAustria | LT| Lithuania | 
-DEGermany | LU| Luxembourg |+BEBelgium | LU| Luxembourg 
 +| DE| Germany | LV| Latvia |
 | DK| Denmark | MK| Macedonia | | DK| Denmark | MK| Macedonia |
 | ES| Spain | MT| Malta | | ES| Spain | MT| Malta |
Line 54: Line 55:
 | HR| Croatia | SE| Sweden | | HR| Croatia | SE| Sweden |
 | HU| Hungary | SI| Slovenia | | HU| Hungary | SI| Slovenia |
 +| |  | SK| Slovakia |
  
-To see the **current status on data delivery**, which countries are delivering data and what they deliver, please see this [[https://tableau.discomap.eea.europa.eu/t/Aironline/views/Airquality_E2a_monitoring/DashboardE2a?:embed=y&:showShareOptions=true|E2a/UTD Air quality - primary pollutants delivery LIVE report]]. The detailed report (in use until Dec/2015) can be found here: [[http://discomap.eea.europa.eu/report/eMonitoring/CurrentStatusE2a|live report]].+To see the **current status on data delivery**, which countries are delivering data and what they deliver, please see this **[[https://tableau.discomap.eea.europa.eu/t/Aironline/views/Airquality_E2a_monitoring/DashboardE2a?:embed=y&:showShareOptions=true|E2a/UTD Air quality - primary pollutants delivery LIVE report]]**. The former, more detailed report can be found here: [[http://discomap.eea.europa.eu/report/eMonitoring/CurrentStatusE2a|live report]] (in use until Dec/2015)\\
  
 +For further information about stations definitions used by EIONET please refer to: [[tools:EIONET-UTDretriever:Stations|EIONET definitions for AQ Stations]] 
  
-Please refer to the "Usage" section for **configuration options** and to see details about pollutant aconyms/notations and time spans of observations.+Please note that EIONET reports the different network_timezone for each station in which the reported observations are. This tool automatically translates all observations to UTC. All observations stored in the eval_new database are in UTC. \\ 
 +** Since 20/apr/2016 EIONET is reporting the metadata for the time zone of observations in Spain, Lithuania, Macedonia and Slovenia (that was previously missing). Therefore, all observations since 20/apr/2016 should be as correct as the information provided by the Member States to EIONET. To date, 05/may/2016 we are still troubleshooting (in contact with Generalitat de Catalunya and EIONET) a discrepancy of 1h of difference in observations for Catalunya. ** \\ 
 +Until 20/apr/2016 it was assumed that observations were in UTC, when no metadata was provided. 
 + 
 +Please refer to the "Usage" section for **configuration options** and to see details about pollutant acronyms/notations and time spans of observations.
  
 ==== Requirements ==== ==== Requirements ====
Line 81: Line 88:
 The two available commands are: The two available commands are:
   * download: Normal operation, automated data download data uses the UpdatedSince and CreatedSince filters. In this mode, the retriever will keep track of the dates of last successful download for each pollutant and country, storing it in the DOWNLOAD_DATE table in the database.   * download: Normal operation, automated data download data uses the UpdatedSince and CreatedSince filters. In this mode, the retriever will keep track of the dates of last successful download for each pollutant and country, storing it in the DOWNLOAD_DATE table in the database.
-  * download_no_filters: Manual operation, intended to troubleshoot missing data in the DB. Since the filters are not used in this mode all observations from the time window specified will be downloaded (be aware of reaching the 50k observations limit has set in place). The retriever will not keep track of the last succesful downloads in this mode.+  * download_no_filters: Manual operation, intended to troubleshoot missing data in the DB. Since the filters are not used in this mode all observations from the time window specified will be downloaded (be aware of reaching the 50k observations limit has set in place). The retriever will not keep track of the last successful downloads in this mode
 +  * download_sliding_no_filters: Same as "download_no_filters" mode but it can be used with relative dates (to today). Please see example below.
  
 In the "download_no_filters" mode the usage is as follows: In the "download_no_filters" mode the usage is as follows:
 <code>python3 EIONETretriever.py download_no_filters GER --fromDate 2016-01-01 --toDate 2016-01-03 > logs/GER/GER-20160101-20160102.log</code> <code>python3 EIONETretriever.py download_no_filters GER --fromDate 2016-01-01 --toDate 2016-01-03 > logs/GER/GER-20160101-20160102.log</code>
 In this example, all observations of the countries (de,at,pl) and pollutants defined in the "GER.conf" file from 01/jan/2016 to 03/jan/2016 (not included) will be downloaded and stored in the database. \\ In this example, all observations of the countries (de,at,pl) and pollutants defined in the "GER.conf" file from 01/jan/2016 to 03/jan/2016 (not included) will be downloaded and stored in the database. \\
 +
 +Example for "download_sliding_no_filters": \\
 +If today is 2016-01-27 and we want to download the observations of days 2016-01-12 and 2016-01-13 we can get them in two ways:
 +<code>python3 EIONETretriever.py download_no_filters ES --fromDate 2016-01-12 --toDate 2016-01-14 > logs/ES/ES-20160112-20160113.log
 +python3 EIONETretriever.py download_sliding_no_filters ES --fromDaysAgo 15 --toDaysAgo 13 > logs/ES/ES-20160112-20160113.log</code>
 +Please note that in this mode, if the option --toDaysAgo is not provided the download will be until to date (now). \\
 +
 As usual, the --help (or -h) option will also display the command-line manual/help. As usual, the --help (or -h) option will also display the command-line manual/help.
 Please note that this article always refers to the 'download' command if not otherwise explicitly stated. Please note that this article always refers to the 'download' command if not otherwise explicitly stated.
Line 107: Line 122:
 Please note that when FromDate is used, then ToDate is mandatory to obtain a response from the service. (When using the command-line options if ToDate is not defined it will be 'now' by default).\\ Please note that when FromDate is used, then ToDate is mandatory to obtain a response from the service. (When using the command-line options if ToDate is not defined it will be 'now' by default).\\
 Due to the different upload and update patterns of the data providers it is needed to request at least a couple of days of observations for the FromDate field, and at the same time, use the UpdatedSinceDate and InsertedSinceDate filters (see "Filters on data download" below) to avoid downloading too much duplicate data and avoid hitting the maximum records per request of the EIONET service (set at 50k records).\\ Due to the different upload and update patterns of the data providers it is needed to request at least a couple of days of observations for the FromDate field, and at the same time, use the UpdatedSinceDate and InsertedSinceDate filters (see "Filters on data download" below) to avoid downloading too much duplicate data and avoid hitting the maximum records per request of the EIONET service (set at 50k records).\\
-The recommended value for 'days_of_obs' is 8 to allow some data providers to recover and upload observations from malfucntions at stations. This value can be safely increased when the filters on data download are used.+The recommended value for 'days_of_obs' is 8 to allow some data providers to recover and upload observations from malfunctions at stations. This value can be safely increased when the filters on data download are used.
  
 == Filters on data download == == Filters on data download ==
Line 120: Line 135:
  
 == Further documentation == == Further documentation ==
-**Internal request fields (to the EIONET service) used in this tool can be found here: http://discomap.eea.europa.eu/map/fme/doc/UTDAirQualityDownloadGuide.pdf** \\ +  * Slides presenting the tool, including information about the number of stations, the rationale behind the different config files and for the chosen update patterns used in the cron, etc. {{tools:eionet-utdretriever:20160209_EIONET-UTDRetriever-Updated.pdf|EIONET-UTDRetriever-Updated_ToolSummary.pdf}} 
-More technical documentation (Sphinx-generated) can be found under the directory docs in the repository.+  * **Internal request fields (to the EIONET service) used in this tool** can be found here: http://discomap.eea.europa.eu/map/fme/doc/UTDAirQualityDownloadGuide.pdf \\ 
 +  More technical documentation (Sphinx-generated) can be found under the directory docs in the repository. 
  
  
 ==== Repository ==== ==== Repository ====
  
-The link to the GIT repository is:\\ <code>https://earth.bsc.es/gitlab/jcuadrad/EIONET.git</code>+The link to the GIT repository is:\\ <code>https://earth.bsc.es/gitlab/es/EIONET.git</code>
  
  
tools/eionet-utdretriever.1453894500.txt.gz · Last modified: 2016/01/27 11:35 by jcuadrad