Earth Sciences

Earth Sciences Wiki

User Tools

Site Tools


working_groups:ukurbangroup

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
working_groups:ukurbangroup [2022/08/05 14:25]
mhajji old revision restored (2022/07/19 12:19)
working_groups:ukurbangroup [2022/08/05 14:26] (current)
mhajji old revision restored (2022/08/05 13:57)
Line 24: Line 24:
   - Copy all the functions and archives needed to implement the procedure in your own directory. To do that, open a terminal and copy the following command:   - Copy all the functions and archives needed to implement the procedure in your own directory. To do that, open a terminal and copy the following command:
 <code bash> git clone https://earth.bsc.es/gitlab/es/universalkriging.git </code> <code bash> git clone https://earth.bsc.es/gitlab/es/universalkriging.git </code>
-The copied files from the repository: https://earth.bsc.es/gitlab/es/universalkriging.git have to be kept in the same folder (your own folder). This folder, called for example //general// , has to be placed in a previous folder where the results will appear, called for example //UniversalKriging_path//. All of this information will be mentioned again in the following steps. +After doing that, a folder called by default //universalkriging// will appear with all the copied files from the repository: https://earth.bsc.es/gitlab/es/universalkriging.git. This will be the main folder, which contains the sub-folder //general// (the one with all scripts and data)and there the results will appear. All of this information will be mentioned again in the following steps. 
  
  
  
- After copying that repository, a list of different archives will appear. They are classified into **R scripts** (the principal script is named by //kriging_repository.R// , while the remaining R scripts are secondary scripts called by the principal one in different parts of the workflow), **folders** (they contain different types of information, required for some of the mentioned scripts), a **configuration file** (named by //config_file.yml// , used by initializing the methodology and launching it in terms of what procedure we would want, as it can be seen in the next steps) and **git-basic files** (as, the README.md).+In the folder //general//, a list of different archives will appear. They are classified into **R scripts** (the principal script is named by //kriging_repository.R// , while the remaining R scripts are secondary scripts called by the principal one in different parts of the workflow), **folders** (they contain different types of information, required for some of the mentioned scripts), a **configuration file** (named by //config_file.yml// , used by initializing the methodology and launching it in terms of what procedure we would want, as it can be seen in the next steps) and **git-basic files** (README.md and LICENSE).
  
  
  
-At this point, I recommend following the tutorial using the Rstudio program to visualize the different scripts.+At this point, it is recommended to follow the tutorial using the Rstudio program to visualize the different scripts.
  
 === The configuration file === === The configuration file ===
Line 38: Line 38:
 The configuration file is an archive used as a setup structure, which means that the variables that appear in it can be changed and it will produce a different output. It is a separate file, so the main advantage is that can be modified without varying the rest of the scripts. The first step to begin consists of filling it. Notice that this is the __only archive__ that has to be modified in terms of your goal. Before starting to modify it, its shape would look like this:  The configuration file is an archive used as a setup structure, which means that the variables that appear in it can be changed and it will produce a different output. It is a separate file, so the main advantage is that can be modified without varying the rest of the scripts. The first step to begin consists of filling it. Notice that this is the __only archive__ that has to be modified in terms of your goal. Before starting to modify it, its shape would look like this: 
   * Through the Rstudio visualization:   * Through the Rstudio visualization:
-{{ :working_groups:config_uk.png?nolink&600 |}}+{{ :working_groups:configR.png?nolink&600 |}}
   * Through the terminal. To do that, go to the directory where all archives copied from the repository are kept, and type the following (in this case, the visualization is done through the program MobaXterm):   * Through the terminal. To do that, go to the directory where all archives copied from the repository are kept, and type the following (in this case, the visualization is done through the program MobaXterm):
     <code bash>     <code bash>
     vi config_file.yml </code>     vi config_file.yml </code>
-{{ :working_groups:config_file_terminal.png?nolink&600 |}}    +{{ :working_groups:configterm.png?nolink&600 |}}    
  
 Now we are going to see what is and the implication of each of the items that have to be filled in the configuration file: Now we are going to see what is and the implication of each of the items that have to be filled in the configuration file:
  
-  * **//UniversalKriging_path//** : (a folder path) \\ this will be the main folder. In the beggingit must contain the folder with all the archives copied from the above repository, which would be **//general//**:+  * **//UniversalKriging_path//** : (a folder path) \\ this will be the main folder, the one that is created after copying the repository. It must contain the sub-folder with all the archives required, which would be **//general//**:
     * **//general//** : (a folder path) \\ this path must be the one where all the archives copied from the repository are kept, and it is included in the **//UniversalKriging_path//**.     * **//general//** : (a folder path) \\ this path must be the one where all the archives copied from the repository are kept, and it is included in the **//UniversalKriging_path//**.
  
  
   * **//year//** : (a number) \\ this is referred to the choosen year. The available years are 2017, 2019, 2020 and 2021.   * **//year//** : (a number) \\ this is referred to the choosen year. The available years are 2017, 2019, 2020 and 2021.
-    *  **//exp_caliope_urban_path//** : (a character, one of the following characters depending on the chosen year:  2017: old/a3xa ; 2019: a4er ; 2020: a4eu ; 2021: a4hm, a4hl ) \\ this parameter is needed in order to use the CALIOPE-Urban output linked to the year of application. In the first application, the output is copied into the user's directory from one of the experiments mentioned. __The user only should fill the character in terms of the chosen year, but the experiments should be the mentioned ones if the correct version of CALIOPE-Urban will be used__. This has to be written between "".+    *  **//exp_caliope_urban_path//** : (a character). Each character refers to an experiment, depending on the chosen year (2017: old/a3xa ; 2019: a4er ; 2020: a4eu ; 2021: a4hm, a4hl)\\ This parameter is needed in order to use the CALIOPE-Urban output linked to the year of application. In the first application, the output is copied into the user's directory from one of the experiments mentioned. __The user only should fill the character in terms of the chosen year, but the experiments should be the mentioned ones if the correct version of CALIOPE-Urban will be used__. This has to be written between "".
     * **//GHOST_no2_path//** : (a folder path: "/gpfs/projects/bsc32/AC_cache/obs/ghost/EEA_AQ_eReporting/1.4/hourly/sconcno2/") \\ this is the path needed to use the observations coming from the monitoring stations used. In the first application, the observations will be copied into the user's directory from that path. __The user should not change this path if the GHOST observations will be used.__     * **//GHOST_no2_path//** : (a folder path: "/gpfs/projects/bsc32/AC_cache/obs/ghost/EEA_AQ_eReporting/1.4/hourly/sconcno2/") \\ this is the path needed to use the observations coming from the monitoring stations used. In the first application, the observations will be copied into the user's directory from that path. __The user should not change this path if the GHOST observations will be used.__
  
-  * **//full_year//** : (//TRUE// or //FALSE//) \\ if the user wants to bias-correct the whole year, this parameter would be //TRUE//. Otherwise//FALSE//+  * **//full_year//** : (//TRUE// or //FALSE//) \\ if the user wants to bias-correct the whole year, this parameter would be //TRUE//, then //date_begin// and //date_end// will be FALSE. If **//full_year//** is //FALSE//, the period chosen cannot involve two different years. In other cases:  
-    * **//date_begin//** : (R-vector format: c(year, month, day) ) \\ if **//full_year//** is //FALSE//, the user has to fill this parameter as a vector of R that contains the year, month and day to begin the methodology. If **//full_year//** is //TRUE//, this parameter has to be //FALSE//. If **//full_year//** is //FALSE//, the period chosen cannot involve two different years. +    * **//date_begin & date_end//** : (R-vector format: c(year, month, day) ) \\ if **//full_year//** is //FALSE//, the user has to fill this parameter as a vector of R that contains the year, month and day to begin the methodology.  
-    * **//date_end//** : (R-vector format: c(year, month, day) ) \\ if **//full_year//** is //FALSE//, the user has to fill this parameter as a vector of R that contains the year, month and day to finish the methodology. If **//full_year//** is //TRUE//, this parameter has to be //FALSE//. If **//full_year//** is //FALSE//, the period chosen cannot involve two different years.+    
  
-   * **//UK_mode//** : (//UK1// or //UK2//) \\ this parameter is referred to the usage or not of the microscale-LUR model as the second covariate. If //UK_mode// is //UK1//, only CALIOPE-Urban is used as the covariate. If //UK_mode// is //UK2//, both CALIOPE-Urban and the microscale-LUR model will be the covariates. This has to be written between [ ]. Only if the //UK_mode// is //UK2// the microscale-LUR model will be used, but the following paths have to be filled in any case: +   * **//UK_mode//** : (//UK1// or //UK2//) \\ this parameter is referred to the usage or not of the microscale-LUR model as the second covariate. If //UK_mode// is //UK1//, only CALIOPE-Urban is used as the covariate. If //UK_mode// is //UK2//, both CALIOPE-Urban and the microscale-LUR model will be the covariates. This has to be written between [ ]. Only if the //UK_mode// is //UK2// the microscale-LUR model will be used. 
-    *     **//sim_LUR//** : (a folder path) \\ this is the path that contains all the information about our microscale-LUR model. This path must be referred to the folder //LUR// copied from the repository, kept in the **//general//** folder. For example, if the **//general//** path is // /esarchive/scratch/acriado/UniversalKriging/general/ //, the **//sim_LUR//** path would be // /esarchive/scratch/acriado/UniversalKriging/general/LUR/ //.  +   * **//application//** : (all possible combination using the following items; //UK//, //cross// , //UK_max//, //cross_max//, //UK_mean//, //uncertainty// ) \\ this is referred to the application of the methodology that we want. This/They has/have to be written between [ ]:  
-      *  **//sim_LUR_final//** : (a folder path) \\ this is the path that contains the information about the final microscale-LUR basemap. The folder //LUR// already includes this folder, so the user has to refer to it from that path. This folder is called //final_results//+   * //UK//: Universal Kriging hourly correction (the default)
-      *  **//sim_LUR_performance//** : (a folder path) \\ this is the path that contains the information about the expected performance of the microscale-LUR basemap. The folder //LUR// already includes this folder, so the user has to refer to it from that path. This folder is called //performance// +   //cross//If we want to apply the Leave-One-Out Cross-Validation, and get the results only at the monitoring stations. 
-   * **//application//** : (all possible combination using the following items; //UK//, //cross// , //UK_max//, //cross_max//, //UK_mean//, //uncertainty// ) \\ this is referred to the application of the methodology that we want. If we want to apply the Universal Kriging hourly correction (the default), the application is //UK//If we want to apply the Leave-One-Out Cross-Validation, and get the results only at the monitoring stations, the application is //cross//. //UK_max// and //cross_max// are the same but instead of the hourly application, the daily maximum one. //UK_mean// is for computing the daily mean and annual mean corrected concentrations. If the //UK// application is not performed before applying this, nothing will occur. //uncertainty// is for calculating the daily mean variances, the annual mean variance and the relative uncertainty associated with the application. This/They has/have to be written between [ ].+   //UK_max// and //cross_max//: They are the same, as mentioned before, but instead of the hourly application, the daily maximum one.  
 +   //UK_mean//: Is for computing the daily mean and annual mean corrected concentrations. If the //UK// application is not performed before applying this, nothing will occur.  
 +   //uncertainty// is for calculating the daily mean variances, the annual mean variance and the relative uncertainty associated with the application. 
  
    
   * **//plot_images//** : (//TRUE// or //FALSE//) \\ if TRUE, plots are generated.   * **//plot_images//** : (//TRUE// or //FALSE//) \\ if TRUE, plots are generated.
   * **//evaluation//** : (//TRUE// or //FALSE//) \\ if TRUE, a statistical evaluation is performed   * **//evaluation//** : (//TRUE// or //FALSE//) \\ if TRUE, a statistical evaluation is performed
-  * **//n_cores//** : (a number) \\ it is referred to the cores of the machine requested to submit the job. __The applications are parallelized in terms of the day.__ It means that, for instance, the spatial bias correction of the 29/03/2019 is done at the same time than the 30/03/2019 one. +  * **//n_cores//** : (a number) \\ it is referred to the cores of the machine requested to submit the job. __The applications are parallelized in terms of the day.__ It means that, for instance, the spatial bias correction of the 29/03/2019 is done at the same time than the 30/03/2019 one. In the section **//The main scriot and its explanation//** or **//Submitting Jobs//** we give an idea to choose this number.
  
 As a resume, notice that the user only has to do the following about the configuration file: As a resume, notice that the user only has to do the following about the configuration file:
-  * Choosing its own **//UniversalKriging_path//** and referring to it the paths **//general//**, **//sim_LUR//**, **//sim_LUR_final//** and **//sim_LUR_performance//**.+  * Choosing you own **//UniversalKriging_path//** and referring to it the path **//general//**.
   * Choosing one of the **//year//**s available, with the appropriate **//exp_caliope_urban_path//**, and choosing between a whole year correction or not.   * Choosing one of the **//year//**s available, with the appropriate **//exp_caliope_urban_path//**, and choosing between a whole year correction or not.
   * Choosing the Universal Kriging mode in terms of the covariates, the application, if plotting and doing the evaluation or not and the cores to submit the job.   * Choosing the Universal Kriging mode in terms of the covariates, the application, if plotting and doing the evaluation or not and the cores to submit the job.
Line 78: Line 80:
  
 Before applying the methodology and obtaining the results, is important to realize the structure of the folders that will appear to understand correctly the different outputs. The structure of the folders created by the code is always the same, and it is constructed at the first moment after applying any of the possible applications, which means some folders may be empty. Some specifications of the folders will be addressed in the next steps, following the workflow of the code.  Before applying the methodology and obtaining the results, is important to realize the structure of the folders that will appear to understand correctly the different outputs. The structure of the folders created by the code is always the same, and it is constructed at the first moment after applying any of the possible applications, which means some folders may be empty. Some specifications of the folders will be addressed in the next steps, following the workflow of the code. 
- 
-Remember that the parallelization is carried out in terms of the day. We are applying the methodology on a mesh composed approximately of 49000 points, each hour of the period chosen. The output of this methodology is __daily__, which means that the output files are referred to each day. Thus, each file will contain the correction on the 49000 points, 24 times regarding the 24h of the day. Please, see the examples to visualize the outputs of this methodology. 
  
 This is an example of the structure using the 2019 dataset : This is an example of the structure using the 2019 dataset :
 {{ :working_groups:folders_uk_structure.png?nolink&600 |}} {{ :working_groups:folders_uk_structure.png?nolink&600 |}}
 +
 +Remember that the parallelization is carried out in terms of the day. We are applying the methodology on a mesh composed approximately of 49000 points, each hour of the period chosen. The output of this methodology is __daily__, which means that the output files are referred to each day. Thus, each file will contain the correction on the 49000 points, 24 times regarding the 24h of the day. Please, see the examples to visualize the outputs of this methodology.
 +
  
 === The main script and its explanation  === === The main script and its explanation  ===
Line 100: Line 103:
   * In the section **//caliope evaluation, mean and max//**, different scripts are used to prepare the files regarding the model (CALIOPE-Urban) output at the monitoring stations (caliope evaluation), and computing the mean and maximum daily and annual values.   * In the section **//caliope evaluation, mean and max//**, different scripts are used to prepare the files regarding the model (CALIOPE-Urban) output at the monitoring stations (caliope evaluation), and computing the mean and maximum daily and annual values.
   * In the section **//variogram total//**, the variogram and its model are created.   * In the section **//variogram total//**, the variogram and its model are created.
-  * The section //**covariates**// is only used if the **//UK_mode//** is //UK2//, since refers to the microscale-LUR model. Two different scripts are employed to address its expected performance and the basemap, respectively.+  * The section //**covariates**// is only used if the **//UK_mode//** is //UK2//, since refers to the microscale-LUR model. Two different scripts (sim_LUR_final & sim_LUR_performance) are employed to address its expected performance and the basemap, respectively.
   * In the section //**application**//, all the applications established by the user are made.   * In the section //**application**//, all the applications established by the user are made.
-  * In the section **//plots//**, the images are generated if the configuration's parameter //**plot_images**// is //TRUE//The same for the section **//evaluation//**, where the specific operations about the paper are made if the parameter //paper// of the **//initial setting section//** of the current script is set to //TRUE//.+  * In the section **//plots//**, the images are generated if the configuration's parameter //**plot_images**// is //TRUE// 
 +  * In the section **//evaluation//**, some statistical operations are carried out. If the year is 2019, the results for the paper regarding this methodology will be generated.
  
 Notice that the majority of the __//secondary operations//__ —for example preparing the observations or extracting the CALIOPE-Urban annual mean— are made in the first application of the user. It means that if the user wants to apply another configuration file, keeping the same period of time, and still using the same paths configuration, the corresponding files are already created based on another previous application and the secondary operations are not computed again. For this reason, a stable folder structure is made and documented. For example, 6h of computational time using 50 cores are needed to calculate the daily mean concentrations of CALIOPE-Urban for a whole year, which implies that those operations, along with the daily maximum ones, will require approximately 12 hours of computation in the first submission, but not for a second application. Notice that the majority of the __//secondary operations//__ —for example preparing the observations or extracting the CALIOPE-Urban annual mean— are made in the first application of the user. It means that if the user wants to apply another configuration file, keeping the same period of time, and still using the same paths configuration, the corresponding files are already created based on another previous application and the secondary operations are not computed again. For this reason, a stable folder structure is made and documented. For example, 6h of computational time using 50 cores are needed to calculate the daily mean concentrations of CALIOPE-Urban for a whole year, which implies that those operations, along with the daily maximum ones, will require approximately 12 hours of computation in the first submission, but not for a second application.
Line 108: Line 112:
 === Submitting jobs  === === Submitting jobs  ===
  
-*It is recommended to first take a look at the guidelines of the machines (NORD3v2 or Marenostrum4 -MN4- specifically) to familiarize yourself with this environment. +*It is recommended to first take a look at the guidelines of the machines (NORD3v2 or Marenostrum4 -MN4- specifically) to familiarize yourself with this environment (url: https://www.bsc.es/user-support/nord3v2.php )
  
 To apply the methodology, a job has to be submitted using the supercomputacional resources of the BSC. To do so, a file //[name].sh// has to be prepared, taking into account the guidelines established in the user's guides of the BSC machines. In the examples, the machine NORD3v2 is the employed one. We recommend the following computational times regarding job directives: To apply the methodology, a job has to be submitted using the supercomputacional resources of the BSC. To do so, a file //[name].sh// has to be prepared, taking into account the guidelines established in the user's guides of the BSC machines. In the examples, the machine NORD3v2 is the employed one. We recommend the following computational times regarding job directives:
working_groups/ukurbangroup.txt · Last modified: 2022/08/05 14:26 by mhajji