This is an old revision of the document!
Universal Kriging is a common geostatistic technique used for spatial interpolation, that combines a (multi)linear regression analysis —with auxiliary variables called covariates— along with a spatial interpolation —done taking into account the auto-correlated spatial structure of the data—. In our case, we have applied this methodology as a post-process of the CALIOPE-Urban dispersion model, developed by the Earth Science Department of the Barcelona Supercomputing Center (BSC). To implement it, we have used the hourly observational NO2 data coming from 12 monitoring stations as the principal variable, and the CALIOPE-Urban hourly NO2 output as the covariate. In addition, we have studied the added value to incorporate as the second covariate our time-invariant microscale-Land Use Regression (LUR) model, developed by using two different NO2 passive dosimeters campaigns and 8 predictors (urban geometric variables, simulated vehicular traffic densities, annually-averaged data bi-linearly interpolated from the regional CALIOPE system and the annually-averaged NO2 output of CALIOPE-Urban) through a machine learning approach. Our implementation is a data-fusion procedure used as a spatial NO2 bias correction in an urban area, the city of Barcelona. Moreover, this correction can be applied directly to the daily maximum NO2 concentrations, instead of the hourly levels. For more information, suggestions or clarifications, please do not hesitate to write an email to the Authors. Notice that a new paper about the implementation of this methodology is under revision.
Álvaro Criado Romero, alvaro.criado@bsc.es
Jan Mateu Armengol, jan.mateu@bsc.es
Meriem Hajji, meriem.hajji@bsc.es
To follow a basic tutorial using this methodology, please follow the next steps. The procedure is implemented using the R software.
git clone https://earth.bsc.es/gitlab/es/universalkriging.git
After doing that, a folder called by default universalkriging will appear with all the copied files from the repository: https://earth.bsc.es/gitlab/es/universalkriging.git. This will be the main folder, which contains the sub-folder general (the one with all scripts and data), and there the results will appear. All of this information will be mentioned again in the following steps.
After copying that repository, a list of different archives will appear. They are classified into R scripts (the principal script is named by kriging_repository.R , while the remaining R scripts are secondary scripts called by the principal one in different parts of the workflow), folders (they contain different types of information, required for some of the mentioned scripts), a configuration file (named by config_file.yml , used by initializing the methodology and launching it in terms of what procedure we would want, as it can be seen in the next steps) and git-basic files (as, the README.md).
At this point, I recommend following the tutorial using the Rstudio program to visualize the different scripts.
The configuration file is an archive used as a setup structure, which means that the variables that appear in it can be changed and it will produce a different output. It is a separate file, so the main advantage is that can be modified without varying the rest of the scripts. The first step to begin consists of filling it. Notice that this is the only archive that has to be modified in terms of your goal. Before starting to modify it, its shape would look like this:
vi config_file.yml
Now we are going to see what is and the implication of each of the items that have to be filled in the configuration file:
| | |\ | || | | \ \| || |) |\ V /_ |
| |_| \_|\/|_| \_\_// \_/|| |
| |
| |
| |
| |
| - Welcome to Nord3v2!! This machine has the same architecture as Nord3 |
| but with updated OS and Slurm! |
| |
| OS Version: Red Hat Enterprise Linux 8.4 (Ootpa) |
| Slurm version: slurm 21.08.8-2 |
| |
| |
| Please contact support@bsc.es for further questions |
| |
+——————————————————————————+
</code>
3. Preparing the job: the sh file is called by kriging_repository.sh. The user has to write the job directories needed to submit it, as the guidelines specificate. The job directories required in this case would be:
* #SBATCH –job-name : the job's name
* #SBATCH –output : the name of the job's output file
* #SBATCH –error : the name of the job's error file
* #SBATCH –qos : the queue chosen to submit the job. It is related to the cores and the computational time
* #SBATCH –time : the computational time required
* #SBATCH -n : the machine's nodes required
* #SBATCH –constraint : if memory options are required, highmem to high resources, medmem to medium-capacity resources. If memory options are not required, the user should not type this option.
As it would be the first submitted job, we use the maximum computational time (48h) and in this case, we choose to use 50 cores. The queue has to be bsc_es in this case.
<code bash>
#!/bin/bash
#SBATCH –job-name=“kriging_first”
#SBATCH –output=K_f_%j.out
#SBATCH –error=K_f_%j.err
#SBATCH –time=48:00:00
#SBATCH –qos=bsc_es
#SBATCH -n 50
#SBATCH –constraint=highmem
module load R
Rscript /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/kriging_repository.R /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/config_file.yml
</code>
4. Submitting the job. With the configuration file and the job prepared, the user just has to submit the job:
<code bash>
sbatch kriging_repository.sh
</code>
5. If the user types:
<code bash>
squeue
</code>
is possible to see the status of the job. The fields that appear are:
* JOBID: ID identification of the job.
* PARTITION: machine's specification.
* NAME: the name of the job.
* USER: the user's number.
* ST: the status of the job, first if it is pending (PD) or running (R). Other options are completed (CD), completing (CG), failed (F), preempted (PR), suspended (S) or stopped (ST). All of this can be seen in the machine's guidelines.
* TIME: the time that has passed since the job is running.
* NODES: the machine's nodes required for the job. It is related to the cores chosen.
* NODELIST(REASON): machine's specification.
This is an example, in this case the directory where the job is launched is / esarchive/scratch/acriado/nord3/ :
6. Waiting until the job is finished. Notice that when a job is submitted, two files are created: the output and the error ones (the user has defined their names in the job directories). In the output file, the user can visualize some indications that appear while running the job. In the error one, the user can see secondary errors that maybe were not enough to cancel the job's running as well as the error if the job was stopped.
7. The job is completed. A folder named by 2019/ will be created at the Universalkriging_path, with the same folder structure presented above. In this case, the folder regarding the bias correction will be UK1/ and all the information about the microscale-LUR model will not be employed. The applications will be made in the same order that appears in the main script.
Some examples of what the shape of the directories would look like:
* The folder caliope20m, inside the sub-folder out, regarding the raw CALIOPE-Urban output for each day:
* The folder UK1/UK/UK_raw/, where the corrections appear in terms of the day:
* The folder UK1/images/, where the plots appear:
Some plots after applying the methodology:
* Annual mean NO2 concentration from CALIOPE-urban:
ex6.pdf
* Annual mean NO2 concentration from the Universal Kriging correction:
ex7.pdf
* NO2 difference between CALIOPE-Urban and the Universal Kriging correction:
ex8.pdf
==== Universal Kriging using both covariates (adding the microscale-LUR basemap), the whole 2019 data and using the UK and cross applications ====
1. Preparing the configuration file, in this case is the following:
2. Enter to the machine
<code bash>
ssh bscXXXXX@nord4.bsc.es
</code>
3. Preparing the job, in this case we are going to reduce the number of cores and computational time, so change the queue too, and not require high memory.
<code bash>
#!/bin/bash
#SBATCH –job-name=“kriging_second”
#SBATCH –output=K_s_%j.out
#SBATCH –error=K_s_%j.err
#SBATCH –time=01:00:00
#SBATCH –qos=debug
#SBATCH -n 30
module load R
Rscript /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/kriging_repository.R /esarchive/scratch/acriado/Rstudio/UniversalKriging/general/config_file.yml
</code>
4. Submitting the job.
<code bash>
sbatch kriging_repository.sh
</code>
5. Checking that everything is OK.
<code bash>
squeue
</code>
6. Waiting until the job is finished.
7. The job is completed. A folder named by UK2/ inside the folder 2019/ will be created, with the same folder structure presented above. Notice that as we have selected only 2 applications (UK and cross), the folders regarding the remaining ones will appear empty. In this case, new plots appear regarding the usage of the microscale-LUR basemap.
Some plots after applying the methodology:
* microscale-LUR basemap:
exc10.pdf
* Annual mean NO2 concentration from the Universal Kriging correction:
exc11.pdf