# AI4PISCES 1. ML NN approach: Application of Tensorflow Keras Long Short Term Memory (LSTM) cells for Sequential() model architecture on time series analysis 2. Statistical approach: Autoregressive Integrated Moving Average (ARIMA) statistical model from Box and Jenkins. Comparing the classical statistical model approach as ARIMA vs LSTM and RNN (Recurrent Neural Network). 3. Testing multivariate LSTM and multivariate ARIMA time series with external sub species time series as external factors of multivariate sense in marine biogeochemistry data product. 4. The test shows better performance for LSTM with 10k parameter size for single geo-location point grid. The RSME < 3% for LSTM and RSME < 9% for ARIMA models, respectivley. 5. The analysis has been performed for 6 variables and one predictor value of lag -1 (autoregression lag 1). 6. The analysis will be proceeded for 5 additional variables, including Zooplankton and Solar Surface Radiation. 7. Then for certain selected areas. 8. Sampled selected points. 9. Global sampling that will have a total of 360lonx290latx12monthsx60years that will lead to 3 Billion parameter pre-trained model and only applicable on MN5. 10. Sampling, with 10 selected areas, will reduce this raw model to a 10lonx10latx12monthsx60years with only 3 Million parameters. ## Roadmap **Preliminary** - [x] Test RNN, LSTM, ARIMA, Linear models on synthetic data - [x] Test multivariate LSTM and ARIMA on synthetic data - [x] Test the approach on CMIP monthly PISCES data set of different species - [ ] Test for different norms and geo-locations **Optimisation of external factors** - [x] It is still unclear how the weights for linear lag correlation model should be applied. - [x] Using just multivariate ARIMA and LSTM? - [x] Lagged multivariate, where the maximum cross correlation should be the constrain on time lagged external factors? - [x] Check the cross correlation for different time lags between the main series and external factors - [x] From the maximum correlation for given time lag, use these time lagged external factors as time series for multivariate analysis of ARIMA and LSTM - [x] ARIMA and LSTM have already pre-built-in functions to add multivariate computation, so that maximum cross-correlation method is not required. - [x] Normalization of the variables is required to mitigate the huge value range difference between parameters. Without this step, both model fail. - [x] Normalization can be proceeded either by applying the same approach of MinMax Scaler, Normal Scaler, Log Scaler etc. Since different variables behave differently, for each variable a different normalization procedure is required. - [x] First, each variable is multiplied with a factor that leads to similar value range of 4-30. Then log-scaler is applied for INTPP, as it shows Power Law variation and a constant value of 12 is added to avoid 0 values, that will lead to failure of the models. - [x] Similar approach is required for new variables. - [ ] Test for different norms and geo-locations with a loop and output in a spreadsheet and plots. - [ ] Test for k-fold variable exclusion from 1 var - 9 var to test for RMSE minimum optimal value (optional). **Functional Coding** - [ ] Replace the Jupyter Notebook with Python functional code and configuration files - [ ] Replace all hard coded funtions with class, attrbutes and methods and modularize it as for the sampling - [ ] Run it on MN5 to create a pre-trained model emulator with SBATCH - [ ] Create SBATCH shell script to run the code on MN5 - [ ] Use PyTorch in parallel **Fine Tuning** - [x] Test on stationarity and normal distribution of the residuals to evaluate ARIMA(p,d,q) - [ ] Fine tune the optimal architecture for LSTM - [ ] Replace the Jupyter Notebook with Python functional code and configuration files - [ ] Run it on MN5 to create a pre-trained model emulator **Requrited Libraries** - [ ] Keras, Tensorflow, Statsmodel.... **Results** ## Getting started To make it easy for you to get started with GitLab, here's a list of recommended next steps. Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)! ## Add your files - [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files - [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command: ``` cd existing_repo git remote add origin https://earth.bsc.es/gitlab/es/ai4pisces.git git branch -M main git push -uf origin main ``` ## Integrate with your tools ## Authors and acknowledgment Show your appreciation to those who have contributed to the project. ## License For open source projects, say how it is licensed. ## Project status I