From shapefile to mask
Hi @nperez @allabres @mlotto @erifarov
This issue is for the development of the function that helps create the mask array from a shapefile (.shp) and a dataset, and save to netCDF file (optionally). The idea is to create masks that are compatible with our tool environment, so the mask should be an array and saved as a netCDF file, including necessary metadata as possible.
The shapefile is the file type saving the information of polygonal regions. Some examples are:
- NUTS: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts
- LAU: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/lau
We have some files under/esarchive/shapefiles/
already.
-
Read .shp file and transfer it to an array The options to load shapefile include
raster
orrgeos/rgdal/sf
. I'll take a look at the existing scripts provided by Núria and Martín and find a more suitable/efficient method. To load the dataset for latitude and longitude, we can use startR or ncdf4/easyNCDF. I assume that the latter option can fulfill our needs and it should be more efficient. Then, compare the shape and the data lat/lon to find the needed points in the data. The points are either 0 or 1, and the dimensions are [lat, lon, (region)]. -
Save the array as netCDF file If the mask is only for temporary use, it's fine to apply the array directly to other data arrays and do the following analysis like area mean. If the mask has the potential to be used constantly (e.g., for operationals), it's worth saving it. The reasons to save as netCDF file are: (1) easy to load with our tools (2) can contain metadata.
-
Store the mask netCDF file Since the shapefile choice is quite individual, users can put the mask files under their own repository. But if the files are used for projects or tools, ask PA to right place to save them.
This is the general idea of the function. Please leave a comment below if I missed any details or if there is anything incorrect. I hope the function is able to cover most of your needs so if you see the direction is not so right for your case, please let me know. We'll ask you to review the function when it's ready, thanks!
Best,
An-Chi
Summary of the function (last update: 12th April 2023)
- The function: https://earth.bsc.es/gitlab/aho/aho-testtest/-/blob/master/shapefile/shp_mask.R
- Example: https://earth.bsc.es/gitlab/aho/aho-testtest/-/blob/master/shapefile/example.R
Current functionality
- Take one .shp file (
shp.file
) and one .nc file or a list of lon/lat (ref.grid
) as grid reference, specify the desired shape regions, find the corresponding grid points for each region and generate a mask array with the value 0 (no shp), 1 (1st region), 2 (2nd region), etc. - The function currently consider shapefile systems "NUTS" and "ADM". You need to specify which system by
shp.system
. - There are two ways to specify the shape regions: (1) Use
reg.ids
, which is the unique ID of each region (2) Usereg.names
, which is a list that contains the country names as the list name and the region vector under each country. This method is less secure because the region name may not be unique; - The parameter
level
is for NUTS system since it has 3 levels. If you usereg.names
as input, this param can make the shape search more secure. Can only choose one level (1, 2, or 3) - One problem of using
reg.names
is that some names have alphabets that cannot be recognized by the function, so copy&paste the names may not work well. -
reg.ids
is prior toreg.names
if both are provided. - Each region is one number in the mask array. Combining multiple regions into one number is not available now.
- for the reference grid,
lat_dim
andlon_dim
can be provided; but if not, the function will try to look for the correct longitude/latitude dimension names and get the values. - "lat", "lon", "index" and the details in shape file of the selected regions are saved as attributes.
Future development
- Save the mask as a netCDF file.
- Explore other shapefiles.
- A better function name?
- Anything you suggest.
-
shapefile ID as an argument -
Take loaded shapefile as input (so step 1 can be skipped) -
sf::st_intersection
options of intersection method
Notes
- If the reference grid is too coarse and the shape region is not found in the grid, return a warning. The index will be preserved still but that number won't be in the mask.
Please run it on Nord3v2. On workstation, it seems that the modules don't work well so the function rgeos::gIntersection returns weird values. I haven't found a solution for now, but if you really need to work on workstation, let me know.- Check R tips to see the necessary modules for
sf
. - Please save the function somewhere in your dir, or source from my dir (see example.R, link above)
- The NUTS shapefile information (e.g., IDs, region names) can be found here: https://ec.europa.eu/eurostat/documents/345175/629341/NUTS2021.xlsx