# Python AMIP reader Conversion from Fortran to Python of the EC Earth 4 project AMIP reader using the OASIS API from pyOASIS. This repository contains tests for running both implementations (Fortran code is taken from [EC Earth v3.3.3.1](https://earth.bsc.es/gitlab/es/auto-ecearth3/-/tree/3.3.3.1)) using the current setup for IFS and AMIP from EC Earth v3.3.3.1. The AMIP reader provides realistic sea surface temperature and sea ice data from 1979 to near present, thus enabling scientists to focus on the atmospheric model without the added complexity of ocean-atmosphere coupled model feedbacks. It is not meant to be used for climate change prediction, an endeavor that requires a coupled atmosphere-ocean model (e.g., see AMIP's sister project CMIP). The new Python AMIP reader is a flexible, highly-customizable and ready out-of-the-box component with the following features: * **Read and send any field** by including it in the [new namelist file](#new-namelist-file) and making sure the input netCDF files follow a [common structure](#add-a-new-field). The following fields have been tested: * SST (Sea Surface Temperature) and SIC (Sea Ice Concentration) constructed **mid-month fields** (exactly as the AMIP reader from [EC Earth v3.3.3.1](https://earth.bsc.es/gitlab/es/auto-ecearth3/-/tree/3.3.3.1)). * SST (Sea Surface Temperature) and SIC (Sea Ice Concentration) constructed **daily fields** (exactly as the AMIP reader for PRIMAVERA from [EC Earth v3.2.2 (PRIMAVERA production)](https://earth.bsc.es/gitlab/es/auto-ecearth3/-/tree/3.2.2_Primavera_production)). * Gridded emissions data as in these [TM5 routines](https://earth.bsc.es/gitlab/svn/ecearth-mirror/-/blob/ceb1f5b95ced1374f190d2a9f9b833a74627c5d8/sources/tm5mp/proj/co2/emission_read__co2.F90) from EC Earth in `m5mp/proj/co2emission_read__co2.F90`. * **Backwards compatibility** with all the namelist files used by the old Fortran AMIP reader. * **Dynamic `grids.nc` and `masks.nc` definitions** for OASIS based on the input netCDF files. It will not override any existing grid with the same name (as stated in the [OASIS 4.0 user guide](http://www.cerfacs.fr/oa4web/oasis3-mct_4.0/oasis3mct_UserGuide.pdf)), providing backwards compatibility with all current implementations. All the source code for the new implementation can be found under the folder [`sources/amip-forcing/src/python/`](sources/amip-forcing/src/python/) of this repository. A report on the implementation can be found in the BSC Earth Wiki at https://earth.bsc.es/wiki/lib/exe/fetch.php?media=degree_students:degree_thesis_rmartin.pdf . ## Table of contents - [Python dependencies](#python-dependencies) - [Usage](#usage) - [New namelist file](#new-namelist-file) - [Example of the new namelist file](#example-of-the-new-namelist-file) - [Backwards compatibility](#backwards-compatibility) - [Add a new field](#add-a-new-field) - [Input netCDF files](#input-netcdf-files) - [Hardcode an operation (inner workings)](#hardcode-an-operation-inner-workings) - [Run an example](#run-an-example) - [Download the datasets](#download-the-datasets) - [Compile](#compile) - [Run](#run) - [Run the tests](#run-the-tests) - [Visualize netCDF output files](#visualize-netcdf-output-files) - [Generate GIF files](#generate-gif-files) ## Python dependencies The new Python implementation relies on the Python modules `f90nml`, `netCDF4` and `numpy`. In order to install these dependencies in your local machine you may execute the following command. ```bash pip3 install f90nml netCDF4 numpy ``` Or if running in MN4 (assuming the modules `impi/2017.4` and `mkl/2017.4` are already loaded): ```bash module load python/3.6.1 module load netcdf/4.2 ``` ## Usage The new Python AMIP reader gets all its configuration from a new `namelist.amip` file. However, it can also be used as **drop-in replacement for the Fortran implementation without any additional configuration** for reading and sending the SST and SIC fields for both EC Earth v3.3.3.1 and EC Earth v3.2.2 (PRIMAVERA production). See the [backwards compatibility section](#backwards-compatibility) for more info. In order to take advantage of all the new functionalities and have more control over the coupling fields and input data, it is highly recommended to use the [new `namelist.amip` configuration structure](#new-namelist-file). Although not necessary, it is also recommended to avoid setting the AMIP grid in the `grids.nc`, `masks.nc` and `areas.nc` files in advance, as the new AMIP reader is able to write them at runtime. ### New namelist file Although the new AMIP reader works with the old `namelist.amip` configuration files, the new namelist configuration file provides greater flexibility to the user when reading and operating with the input data. Here is a scheme for the new namelist generator bash script taken from [`namelist_python.amip.sh`](./tests/data/namelist_python.amip.sh): ```bash cat << EOF !----------------------------------------------------------------------- &NAMAMIP !----------------------------------------------------------------------- RunLengthSec = ${leg_length_sec} TimeStepSec = ${cpl_freq_amip_sec} StartYear = ${leg_start_date_yyyymmdd:0:4} StartMonth = ${leg_start_date_yyyymmdd:4:2} StartDay = ${leg_start_date_yyyymmdd:6:2} FixYear = ${ifs_cmip_fixyear} # Vars(1,:) = # ... # Vars(n,:) = ... LDebug = false !----------------------------------------------------------------------- / EOF ``` The main difference between the old and the new namelist files is the substitution of the fields exclusively defined for SST and SIC grids (`FileListSST`, `FileListSIC` and `LInterpolate`) for an agnostic array of fields `Vars(:,:)`. `Vars(:,:)` shape is `(n,13)`, being `n` the number of fields to exchange and each coupling field declaration must follow the structure below: | Index | Name | Type | Description | | ------ | ------ | ----------------------- | ------------------------- | | 0 | id | `string` | Unique identifier for the field. | | 1 | grid_name | `string` | Grid name (must match one of the grids declared in the `namcouple` file). | | 2 | oasis_name | `string` | Variable name (must match one of variables under the `grid_name` declared in the `namcouple` file). | | 3 | file_pattern | `string` | File pattern of the input netCDF files. I.e: `'HadISST2_prelim_0to360_alldays_sst_[year].nc'`. All patterns must be between brackets `'[]'` and currently the only accepted pattern is: `year`. | | 4 | netcdf_variable | `string` | Variable to read in the netCDF files. | | 5 | yref_min | `int32` | Reference year for the `time` variable in the netCDF files. | | 6 | yref_max | `int32` | The last year of the netCDF files. | | 7 | timestep | `'monthly'` \| `'daily'` | The step in number of days of the `time` variable in the netCDF files. | | 8 | interpolate | `true` \| `false` | Whether to interpolate or not (disabled if the timestep is set to `daily`). | | 9 | scale_factor | `float64` | Scale factor applied after reading the input data. | | 10 | offset | `float64` | Offset applied after reading the input data. | | 11 | min | `float64` \| `None`* | Minimum value used to clip the input data before submitting the field. | | 12 | max | `float64` \| `None`* | Maximum value used to clip the input data before submitting the field. | \* In order to set a section as `None` you must leave blank the content between its comas. I.e: for setting the last section (`max`) as `None` in AMIP_sst, the field `Vars(1,:)` ends with `..., 273.15, 271.38, ,`. See the [example of the new namelist file](#example-of-the-new-namelist-file) for the complete configuration. Below is an example of the old `namelist.amip` file used for AMIP forcing in EC Earth v3.3.3.1 next to the new equivalent one created by the script above: #### Example of the old namelist file ``` !----------------------------------------------------------------------- &NAMAMIP !----------------------------------------------------------------------- RunLengthSec = 5097600 TimeStepSec = 86400 StartYear = 1991 StartMonth = 01 StartDay = 01 FixYear = 0 FileListSST = 'tosbcs_input4MIPs_SSTsAndSeaIce_CMIP_PCMDI-AMIP-1-1-3_gn_187001-201706.nc' FileListSIC = 'siconcbcs_input4MIPs_SSTsAndSeaIce_CMIP_PCMDI-AMIP-1-1-3_gn_187001-201706.nc' LDebug = false LInterpolate = true !----------------------------------------------------------------------- / ``` #### Example of the new namelist file ``` !----------------------------------------------------------------------- &NAMAMIP !----------------------------------------------------------------------- RunLengthSec = 5097600 TimeStepSec = 86400 StartYear = 1991 StartMonth = 01 StartDay = 01 FixYear = 0 Vars(1,:) = 'AMIP_sst_monthly', 'AMIP', 'AMIP_sst', 'tosbcs_input4MIPs_SSTsAndSeaIce_CMIP_PCMDI-AMIP-1-1-3_gn_187001-201706.nc', 'tosbcs', 1870, 2016, 'monthly', true, 1, 273.15, 271.38, , Vars(2,:) = 'AMIP_sic_monthly', 'AMIP', 'AMIP_sic', 'siconcbcs_input4MIPs_SSTsAndSeaIce_CMIP_PCMDI-AMIP-1-1-3_gn_187001-201706.nc', 'siconcbcs', 1870, 2016, 'monthly', true, 0.01, 0, 0, 1, LDebug = false !----------------------------------------------------------------------- / ``` #### Backwards compatibility The new AMIP reader is backwards compatible with both EC Earth v3.3.3.1 and EC Earth v3.2.2 (PRIMAVERA production) namelist files. It achieves this by internally converting the configuration from the old namelist files to the new representation when reading the namelist file (source code in function `read_namelist()` in [`amip_utils.py`](./sources/amip-forcing/src/python/amip_utils.py)). Here is the internal conversion of the SST coupling field when using each old namelist file: ##### EC Earth v3.3.3.1 (AMIP forcing) | Property | Value | | ---------- | ----------------------- | | id | `'AMIP_sst_monthly'` | | grid_name | `'AMIP'` | | oasis_name | `'AMIP_sst'` | | file_pattern | `FileListSST` content from the old namelist file. | | netcdf_variable | `'tosbcs'` | | yref_min | `1870` | | yref_max | `2016` | The last year of the netCDF files. | | timestep | `'monthly'` | | interpolate | `LInterpolate` content from the old namelist file. | | scale_factor | `1` | | offset | `273.15` | | min | `271.38` | | max | `None` | ##### EC Earth v3.2.2 (PRIMAVERA production) | Property | Value | | ---------- | ----------------------- | | id | `'AMIP_sst_daily'` | | grid_name | `'PSST'` | | oasis_name | `'AMIP_sst'` | | file_pattern | `AmipFileRoot` content from the old namelist file appending `'_sst_[year].nc'`. | | netcdf_variable | `'sst'` | | yref_min | `1850` | | yref_max | `2016` | The last year of the netCDF files. | | timestep | `'daily'` | | interpolate | `False` | | scale_factor | `1` | | offset | `0` | | min | `None` | | max | `None` | ### Add a new field Adding a new field solely consists on adding the input netCDF files to the runtime folder of the program and adding a new entry to the `Vars(:,:)` field in the namelist following the structure mentioned above (see the [new namelist file](#new-namelist-file) section). #### Input netCDF files All input netCDF files must contain the following variables (assuming the dimensions `time`, `lat` and `lon` are defined): | Variable | Type | Shape | Description | | --------- | ------ | ------------------------ | ------------------------- | | `time` | `double` | `(time)` | Days since the reference year defined as `yref_min` in its namelist entry. | | `lat` or `latitude` | `double` | `(lat)` | Latitude. | | `lon` or `longitude` | `double` | `(lon)` | Longitude. | | Input variable name | `double` or `float` | `(time, ..., lat, lon)` | The variable name must match the `netcdf_variable` value in its namelist entry. | ### Hardcode an operation (inner workings) If it is necessary to hardcode an operation with respect to one of the coupling fields, it is highly recommended to edit the code within the [`AMIPVar` class](sources/amip-forcing/src/python/amip_var.py). Right now, there are already hardcoded operations used to match the old Fortran implementation or read an specific index of the input netCDF variable. Here are some examples: ```python logging.debug('{}: no time interpolation t,t2 {} {}'.format(self.id, t_local, self.t2)) # Hardcoded to match Fortran's AMIP clipping avoidance if self.oasis_name == 'AMIP_sst': var_min = np.NINF ``` ```python # This section can be "hardcoded" by the user # Hardcoded CO2 emissions data. Read only sector id 1: Energy if self.var_name == 'CO2_em_anthro': raw_field = raw_field[1] ``` While the Python implementation was originally inspired by the Fortran implementation, it has now been refactored to take advantage of Python's OOP and simpler modules. For a deeper understanding, here are the simplified sequence and class diagrams of the current implementation. ![Sequence_Diagram](docs/Sequence_Diagram.png) ![Class_Diagram](docs/Class_Diagram.png) ## Run an example Running an example consists on executing one of the implementations (Fortran or Python) along with an IFS tyo model. This toy model simulates IFS reception behavior as designed in EC Earth v3.3.3.1. The detailed specification can be found below, with the coupling interval and run length taken directly from the `namelist.amip` file. ```python # Constants NAMELIST_FILE_NAME = 'namelist.amip' L080_NX = 35718 # number of LAND grid cells at T159 resolution L128_NX = 88838 # number of LAND grid cells at T255 resolution L256_NX = 348528 # number of LAND grid cells at T511 resolution ``` **IMPORTANT NOTICE: This documentation assumes you are running an UNIX-based OS with Python +3.6 in the PATH, understand the basics of OASIS and are able to compile and run pyOASIS.** The bash scripts for running the examples rely on the command `cdo` for checking the results. In order to install this dependency, you may execute the following code. ```bash apt install -y cdo ``` Or if running in MN4: ```bash module load cdo/1.9.3 ``` ### Download the datasets The easiest way to download the datasets is to copy the datasets to your copy from 2 folders `tests/data/forcing/ and tests/data/co2/ from @etourign located here: on BSC marenostrum4 : /gpfs/scratch/bsc32/bsc32051/git/python-amip-reader on ECMWF HPC2020 : /hpcperm/c3et/work/python-amip-reader ### Download the datasets (deprecated) Due to the large size of some of the required files and datasets, they are not provided in this repository. It is necessary to download them from MN4 before executing any code. Use the following script to download the required files (note you must change `bsc32074` to your username in MN4): ```bash scp -r bsc32074@mn1.bsc.es:/gpfs/projects/bsc32/models/ecearth/v3.3.3.1/inidata/oasis/AMIP/* ./tests/data/forcing scp -r bsc32074@mn1.bsc.es:/gpfs/projects/bsc32/models/ecearth/v3.3.3.1/inidata/amip-forcing/* ./tests/data/forcing # PRIMAVERA dataset scp -r bsc32074@mn1.bsc.es:/gpfs/projects/bsc32/models/ecearth/v3.2.1/inidata/oasis/AMIP-reader/OLD_AMIP_NL/* ./tests/data/primavera scp -r bsc32074@mn1.bsc.es:/gpfs/projects/bsc32/models/ecearth/v3.2.2/inidata/amip-primavera/* ./tests/data/primavera # Emissions dataset scp -r bsc32074@mn1.bsc.es:/gpfs/projects/bsc32/models/ecearth/trunk/inidata/tm5/CMIP6/CO2-em-anthro_input4MIPs_emissions_CMIP_CEDS-2017-05-18_gn_200001-201412.nc ./tests/data/co2 ``` If you are downloading the data from PRIMAVERA (+74 GB), it is highly recommended to do it in the background: ```bash nohup scp -r bsc32074@mn1.bsc.es:/gpfs/projects/bsc32/models/ecearth/v3.2.2/inidata/amip-primavera/* ./tests/data/primavera > nohup.out 2>&1 # ctrl + z bg ``` Or if running in MN4: ```bash cp /gpfs/projects/bsc32/models/ecearth/v3.3.3.1/inidata/oasis/AMIP/* ./tests/data/forcing cp /gpfs/projects/bsc32/models/ecearth/v3.3.3.1/inidata/amip-forcing/* ./tests/data/forcing cp /gpfs/projects/bsc32/models/ecearth/v3.2.1/inidata/oasis/AMIP-reader/OLD_AMIP_NL/* ./tests/data/primavera cp /gpfs/projects/bsc32/models/ecearth/v3.2.2/inidata/amip-primavera/* ./tests/data/primavera cp /gpfs/projects/bsc32/models/ecearth/trunk/inidata/tm5/CMIP6/CO2-em-anthro_input4MIPs_emissions_CMIP_CEDS-2017-05-18_gn_200001-201412.nc ./tests/data/co2 ``` ### Compile Each of the example codes is located in its respect folder inside the [`./sources/amip-forcing/`](./sources/amip-forcing/) folder ([`src/fortran/`](./sources/amip-forcing/src/fortran/) and [`src/python/`](./sources/amip-forcing/src/python/)). In the source folder for the Fortran code there is a [Makefile](./sources/amip-forcing/src/fortran/Makefile). Before compiling OASIS, you must create a `make.inc` file which redirects to a make file with your configuration. You may issue the following command to create your own `make.inc` file: ```bash cp sources/oasis3-mct/util/make_dir/make.templ.inc sources/oasis3-mct/util/make_dir/make.inc ``` This would be the contents in `make.inc` if running on your own laptop: ``` curr_path = $(dir $(realpath $(lastword $(MAKEFILE_LIST)))) include $(curr_path)/make.linux_gnu_openmpi4 ``` For running in BSC Marenostrum4 (mn4), you may change the `include` path to `make.mn4`, so the file contents of `make.inc` would look like this: ``` curr_path = $(dir $(realpath $(lastword $(MAKEFILE_LIST)))) include $(curr_path)/make.mn4 ``` For running in ECMWF HPC2020 (hpc2020), you may change the `include` path to `make.hpc2020`, so the file contents of `make.inc` would look like this: ``` curr_path = $(dir $(realpath $(lastword $(MAKEFILE_LIST)))) include $(curr_path)/make.hpc2020 ``` **WARNING: `make.mn4` and `make.hpc2020` assume the proper modules are loaded.** For make.mn4 this is done with the following command: `module load python/3.6.1 netcdf/4.2` For make.hpc2020 this is done with `module reset ; module load python3/3.10.10-01 prgenv/intel intel/2021.4.0 hpcx-openmpi/2.9.0 hdf5-parallel/1.12.2 netcdf4-parallel/4.9.1` For other platforms you may need to modify or add a new makefile for your specific MPI setup. By executing the following command, you will compile OASIS, pyOASIS and the Fortran implementation of this model. ```bash (cd ./sources/amip-forcing/src/fortran && make clean) (cd ./sources/amip-forcing/src/fortran && make oasis) (cd ./sources/amip-forcing/src/fortran && make) ``` If this is the first time you are compiling pyOASIS in the current session, you must update `LD_LIBRARY_PATH` and `PYTHONPATH` environment variables. Luckily, pyOASIS provides a automatic script: ```bash source sources/oasis3-mct/generated/python/init.sh ``` ```bash source sources/oasis3-mct/generated/python/init.csh ``` Alternatively, these lines could be added to your `~/.bashrc` or `~/.cshrc` file. ### Run In order to run the example code with your desired programming language you must execute the script `run_example.sh` inside the [`tests/`](./tests/) folder. Here is an example on how to run the example with the AMIP reader running in Python and the IFS toy model with 4 MPI processes from 1990-01-01 to 1991-01-01 without a fixed year. ```bash bash ./run_example.sh python forcing 4 1990-01-01 1991-01-01 0 L128 true ``` Here is the complete specification for the script: ```bash ./run_example.sh ``` Running this script will create a new folder in the root folder of this repository called `work________` with all the output and debug information from the run. ## Run the tests Execute the following command in the root folder of this repository to run all tests **except the large test** (make sure to meet all the [requirements](#requirements) stated above). If you are on MN4, it is highly recommended to start first an interactive session: ``` salloc -p interactive ``` ```bash bash run_tests.sh ``` If you also want to run the large test along with the other ones, use the following: ```bash bash run_tests.sh 1 ``` This command will run both implementations (Fortran and Python) of the AMIP reader and check if all of them result in the same output. Here is an example of a successful test run: ``` Running tests WITHOUT the large test. Use ./run_tests.sh 1 to run all tests including the large test. ***************************************************************** Running tests... ***************************************************************** Checking results... Checking ./work_python_forcing_2_1892-01-01_1894-01-01_0_L128_true vs ./work_fortran_forcing_2_1892-01-01_1894-01-01_0_L128_true cdo diff: Processed 2 variables over 1462 timesteps [0.23s 52MB]. cdo diff: Processed 2 variables over 1462 timesteps [0.22s 52MB]. cdo diff: Processed 2 variables over 1462 timesteps [0.30s 52MB]. cdo diff: Processed 2 variables over 1462 timesteps [0.31s 52MB]. ... Tests PASSED ``` ## Visualize netCDF output files In order to visualize the results from an experiment, you may first fix the grid and time axis of the `EXPOUT` generated files. First, move into the work directory created when running the experiment. For example: ```bash cd ./tests/work_python_forcing_4_1990-01-01_1991-01-01_0_L128_true/ ``` Then, from inside that directory, run the following script with the start date as the argument: ```bash bash ./convert-ece4pyreader-grids.sh 19900101 ``` As a result, you will now have 8 new files in the working directory for the experiment: `A_Ice_frac.nc`, `A_Ice_frac.grb`, `A_SST.nc`, `A_SST.grb`, `AMIP_sst.nc`, `AMIP_sic.nc`, `areas_AMIP.nc`, `areas_L128.grb`, `areas_L128.nc`, `masks_AMIP.nc`, `masks_L128.grb` and `masks_L128.nc`. ### Generate GIF files Optionally, you may want to save visualization as a GIF file. In order to achieve this, you may first run `./convert-ece4pyreader-grids.sh` as before and then the script `nc_to_gif.py` located in the [`tests/`](./tests/) folder. If you are in the working folder and want to visualize the `A_Ice_frac.nc` file, you may run the following command: ```bash ./nc_to_gif.py work_python_forcing_4_1990-01-01_1991-01-01_0_L128_true A_Ice_frac ``` Here is the complete specification for the script: ```bash ./nc_to_gif.py ``` Running the script will generate two files: a raw GIF file and an optimized one. In this case `A_Ice_frac.gif` and `A_Ice_frac_optimized.gif` are generated.