README.md 4.88 KB
Newer Older
Nabiz's avatar
Nabiz committed
# AI4PISCES


Nabiz's avatar
Nabiz committed
1. ML NN approach:
Application of Tensorflow Keras Long Short Term Memory (LSTM) cells for Sequential() model architecture on time series analysis 
2. Statistical approach:
Autoregressive Integrated Moving Average (ARIMA) statistical model from Box and Jenkins.

Comparing the classical statistical model approach as ARIMA vs LSTM and RNN (Recurrent Neural Network).


3. Testing multivariate LSTM and multivariate ARIMA time series with external sub species time series as external factors of multivariate sense in marine biogeochemistry data product.
4. The test shows better performance for LSTM with 10k parameter size for single geo-location point grid. The RSME < 3% for LSTM and RSME < 9% for ARIMA models, respectivley.
5. The analysis has been performed for 6 variables and one predictor value of lag -1 (autoregression lag 1).
6. The analysis will be proceeded for 5 additional variables, including Zooplankton and Solar Surface Radiation.
7. Then for certain selected areas.
8. Sampled selected points.
9. Global sampling that will have a total of 360lonx290latx12monthsx60years that will lead to 3 Billion parameter pre-trained model and only applicable on MN5.
10. Sampling, with 10 selected areas,  will reduce this raw model to a 10lonx10latx12monthsx60years with only  3 Million parameters.
Nabiz's avatar
Nabiz committed

## Roadmap
**Preliminary**
- [x] Test RNN, LSTM, ARIMA, Linear models on synthetic data
- [x] Test multivariate LSTM and ARIMA on synthetic data
- [x] Test the approach on CMIP monthly PISCES data set of different species
Nabiz's avatar
Nabiz committed
- [ ] Test for different norms and geo-locations 
Nabiz's avatar
Nabiz committed

**Optimisation of external factors**
- [x] It is still unclear how the weights for linear lag correlation model should be applied.
- [x] Using just multivariate ARIMA and LSTM?
- [x] Lagged multivariate, where the maximum cross correlation should be the constrain on time lagged external factors?
- [x] Check the cross correlation for different time lags between the main series and external factors
- [x] From the maximum correlation for given time lag, use these time lagged external factors as time series for multivariate analysis of ARIMA and LSTM
Nabiz's avatar
Nabiz committed
- [x] ARIMA and LSTM have already pre-built-in functions to add multivariate computation, so that maximum cross-correlation method is not required.
- [x] Normalization of the variables is required to mitigate the huge value range difference between parameters. Without this step, both model fail.
- [x] Normalization can be proceeded either by applying the same approach of MinMax Scaler, Normal Scaler, Log Scaler etc. Since different variables behave differently, for each variable a different
normalization procedure is required.
Nabiz's avatar
Nabiz committed
- [x] First, each variable is multiplied with a factor that leads to similar value range of 4-30. Then log-scaler is applied for INTPP, as it shows Power Law variation and a constant value of 12 is added to avoid 0 values, that will lead to failure of the models.
- [x] Similar approach is required for new variables.
- [ ] Test for different norms and geo-locations with a loop and output in a spreadsheet and plots.
- [ ] Test for k-fold variable exclusion from 1 var - 9 var to test for RMSE minimum optimal value (optional).

**Functional Coding**

- [ ] Replace the Jupyter Notebook with Python functional code and configuration files
- [ ] Replace all hard coded funtions with class, attrbutes and methods and modularize it as for the sampling
- [ ] Run it on MN5 to create a pre-trained model emulator with SBATCH
- [ ] Create SBATCH shell script to run the code on MN5
- [ ] Use PyTorch in parallel


Nabiz's avatar
Nabiz committed

**Fine Tuning**
- [x] Test on stationarity and normal distribution of the residuals to evaluate ARIMA(p,d,q)
Nabiz's avatar
Nabiz committed
- [ ] Fine tune the optimal architecture for LSTM
- [ ] Replace the Jupyter Notebook with Python functional code and configuration files
- [ ] Run it on MN5 to create a pre-trained model emulator


**Requrited Libraries**
- [ ] Keras, Tensorflow, Statsmodel....

Nabiz's avatar
Nabiz committed

**Results**
Nabiz's avatar
Nabiz committed

## Getting started

To make it easy for you to get started with GitLab, here's a list of recommended next steps.

Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!

## Add your files

- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:

```
cd existing_repo
git remote add origin https://earth.bsc.es/gitlab/es/ai4pisces.git
git branch -M main
git push -uf origin main
```

## Integrate with your tools


## Authors and acknowledgment
Show your appreciation to those who have contributed to the project.

## License
For open source projects, say how it is licensed.

## Project status
Nabiz's avatar
Nabiz committed
I