Earth Sciences Wiki
Universal Kriging is a common geostatistic technique used for spatial interpolation, that combines a (multi)linear regression analysis —with auxiliary variables called covariates— along with a spatial interpolation —done taking into account the auto-correlated spatial structure of the data—. In our case, we have applied this methodology as a post-process of the CALIOPE-Urban dispersion model, developed by the Earth Science Department of the Barcelona Supercomputing Center (BSC). To implement it, we have used the hourly observational NO2 data from 12 monitoring stations as the principal variable and the CALIOPE-Urban hourly NO2 output as the covariate. In addition, we have studied the added value to incorporate as the second covariate our time-invariant microscale-Land Use Regression (LUR) model, developed by using two different NO2 passive dosimeters campaigns and 8 predictors (urban geometric variables, simulated vehicular traffic densities, annually-averaged data bi-linearly interpolated from the regional CALIOPE system and the annually-averaged NO2 output of CALIOPE-Urban) through a machine learning approach. Our implementation is a data-fusion procedure used as a spatial NO2 bias correction in an urban area, the city of Barcelona. Moreover, this correction can be applied directly to the daily maximum NO2 concentrations, instead of the hourly levels. For more information, suggestions or clarifications, please do not hesitate to write an email to the Authors. Notice that a new paper about implementing this methodology is under revision.
Álvaro Criado Romero, firstname.lastname@example.org
Jan Mateu Armengol, email@example.com
Meriem Hajji, firstname.lastname@example.org
It is recommended to follow the tutorial using the Rstudio program or the terminal to visualize the different scripts.
To configure the Rstudio app and open it through your workstation, set up the following:
module load RStudio/1.1.463-foss-2015a module load R/3.6.1-foss-2015a-bare rstudio &
To launch it in the workstation, you have to configure your bashrc:
module load RStudio/1.1.463-foss-2015a module load R/3.6.1-foss-2015a-bare
Please follow the next steps to follow a basic tutorial using this methodology. The procedure is implemented using the R software.
git clone https://earth.bsc.es/gitlab/es/universalkriging.git
After that, a folder called by default universalkriging will appear with all the copied files from the repository: https://earth.bsc.es/gitlab/es/universalkriging.git. This will be the main folder, which contains the sub-folder general (the one with all scripts and data), and the results will appear there. All of this information will be mentioned again in the following steps.
A list of different archives will appear in the folder general. They are classified into R scripts (the principal script is named by kriging_repository.R , while the remaining R scripts are secondary scripts called by the principal one in different parts of the workflow), folders (they contain different types of information, required for some of the mentioned scripts), a configuration file (named by config_file.yml , used by initializing the methodology and launching it in terms of what procedure we would want, as it can be seen in the following steps) and git-basic files (README.md and LICENSE).
The configuration file is an archive used as a setup structure, meaning that the variables that appear in it can be changed and produce a different output. It is a separate file, so the main advantage is that it can be modified without varying the rest of the scripts. The first step to begin consists of filling it. Notice that this is the only archive that has to be modified in terms of your goal. Before starting to modify it, its shape would look like this:
Now we are going to see what is and the implication of each of the items that have to be filled in the configuration file:
As a resume, notice that the user only has to do the following about the configuration file:
Before applying the methodology and obtaining the results, it is essential to realize the structure of the folders that will appear to understand the different outputs correctly. The design of the folders created by the code is always the same, and it is constructed at the first moment after applying any of the possible applications, which means some folders may be empty. Some specifications of the folders will be addressed in the following steps, following the workflow of the code.
This is an example of the structure using the 2019 dataset :
Remember that the parallelization is carried out in terms of the day. We are applying the methodology on a mesh of approximately 49000 points, each hour of the chosen period. The output of this methodology is daily, which means that the output files are referred to each day. Thus, each file will contain the correction on the 49000 points, 24 times regarding the 24h of the day. Please, see the examples to visualize the outputs of this methodology.
The kriging_repository.R script, placed in the folder general, is the main script regarding the application of the methodology and the one that will be submitted. Please, open it to see the code and the following explanation (through Rstudio or the terminal, the same as the visualization of the configuration file). Notice that the user does not have to change this script, the only part ready to change is the configuration file. It is split into different sections:
Notice that the majority of the secondary operations —for example preparing the observations or extracting the CALIOPE-Urban annual mean— are made in the first application of the user. Suppose the user wants to apply another configuration file, keep the same period, and still use the same path configuration. In that case, the corresponding files are already created based on another previous application and the secondary operations are not computed again. For this reason, a stable folder structure is made and documented. For example, 6h of computational time using 50 cores are needed to calculate the daily mean concentrations of CALIOPE-Urban for a whole year, which implies that those operations, along with the maximum daily ones, will require approximately 12 hours of computation in the first submission, but not for a second application.
*It is recommended first to look at the machines' guidelines (NORD3v2 or Marenostrum4 -MN4- specifically) to familiarize yourself with this environment (URL: https://www.bsc.es/user-support/nord3v2.php ).
To apply the methodology, a job has to be submitted using the supercomputacional resources of the BSC. To do so, a file [name].sh has to be prepared, taking into account the guidelines established in the user guides of the BSC machines. In the examples, the machine NORD3v2 is the employed one. We recommend the following computational times regarding job directives:
The user has to charge the model in the bashrc (referred to the machine):
module load R
To submit the job regarding this methodology, the user only has to launch its own [name].sh archive with the Rscript kriging_repository.R followed by the configuration file. As an example:
#!/bin/bash #SBATCH --job-name="kriging" #SBATCH --output=K_%j.out #SBATCH --error=K_%j.err #SBATCH --time=01:00:00 #SBATCH --qos=bsc_es #SBATCH -n 30 module load R Rscript /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/kriging_repository.R /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/config_file.yml
In the following examples, the UniversalKriging_path is :
/esarchive/scratch/acriado/Rstudio/UniversalKriging/ . This is the shape of the directories UniversalKriging/ and general/ before applying the methodology:
ssh bscXXXXX@nord4.bsc.es +------------------------------------------------------------------------------+ | | | .-.--_ | | ,','.' `. | | | | | BSC | | | `.`.`. _ .' | | `·`·· | | | | | | _ _ ____ _____ _____ ____ ___ | | | \ | |/ __ \| __ \| __ \___ \ |__ \ | | | \| | | | | |__) | | | |__) |_ __ ) | | | | . ` | | | | _ /| | | |__ <\ \ / // / | | | |\ | |__| | | \ \| |__| |__) |\ V // /_ | | |_| \_|\____/|_| \_\_____/____/ \_/|____| | | | | | | | | | | - Welcome to Nord3v2!! This machine has the same architecture as Nord3 | | but with updated OS and Slurm! | | | | OS Version: Red Hat Enterprise Linux 8.4 (Ootpa) | | Slurm version: slurm 21.08.8-2 | | | | | | Please contact email@example.com for further questions | | | +------------------------------------------------------------------------------+
3. Preparing the job: the sh file is called by kriging_repository.sh. The user has to write the job directories needed to submit it, as the guidelines specificate. The job directories required in this case would be:
As it would be the first submitted job, we use the maximum computational time (48h) and in this case, we choose to use 50 cores. The queue has to be bsc_es in this case.
#!/bin/bash #SBATCH --job-name="kriging_first" #SBATCH --output=K_f_%j.out #SBATCH --error=K_f_%j.err #SBATCH --time=48:00:00 #SBATCH --qos=bsc_es #SBATCH -n 50 #SBATCH --constraint=highmem module load R Rscript /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/kriging_repository.R /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/config_file.yml
4. Submitting the job. With the configuration file and the job prepared, the user just has to submit the job:
5. If the user types:
is possible to see the status of the job. The fields that appear are:
This is an example, in this case the directory where the job is launched is / esarchive/scratch/acriado/nord3/ :
6. Waiting until the job is finished. Notice that when a job is submitted, two files are created: the output and the error ones (the user has defined their names in the job directories). In the output file, the user can visualize some indications that appear while running the job. In the error one, the user can see secondary errors that may not be enough to cancel the job's running and the error if the job was stopped.
7. The job is completed. A folder named by 2019/ will be created at the Universalkriging_path, with the same folder structure presented above. In this case, the folder regarding the bias correction will be UK1/ and all the information about the microscale-LUR model will not be employed. The applications will be made in the same order that appears in the main script.
Some examples of what the shape of the directories would look like:
Some plots after applying the methodology:
1. Preparing the configuration file, in this case is the following:
2. Enter to the machine
3. Preparing the job, in this case we will reduce the number of cores and computational time, so change the queue too, and not require high memory.
#!/bin/bash #SBATCH --job-name="kriging_second" #SBATCH --output=K_s_%j.out #SBATCH --error=K_s_%j.err #SBATCH --time=01:00:00 #SBATCH --qos=debug #SBATCH -n 30 module load R Rscript /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/kriging_repository.R /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/config_file.yml
4. Submitting the job.
5. Checking that everything is OK.
6. Waiting until the job is finished.
7. The job is completed. A folder named by UK2/ inside the folder 2019/ will be created, with the same folder structure presented above. Notice that as we have selected only 2 applications (UK and cross), the folders regarding the remaining ones will appear empty. In this case, new plots appear regarding the usage of the microscale-LUR basemap.
Some plots after applying the methodology: