Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • E Earth Diagnostics
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth SciencesEarth Sciences
  • Earth Diagnostics
  • Issues
  • #20
Closed
Open
Issue created Jan 05, 2017 by François Massonnet@fmassonnet

Parallelization of EarthDiags jobs

Hi @jvegas !

As you may remember, I need to CMORize and make diagnostics on an experiment (a0dc) that comprises many members (25) and many chunks (504). Given the current non-possibility to start the CMORization at a given chunk (issues #17 (closed) and #14 (closed)), I'm trying to find other work arounds to make diagnostics on this experiment. Indeed, I have tried CMORizing chunks up to 260, and after three days this is not yet finished...

So my idea was (actually, that was @pabretonniere 's idea so I think it's a good one :-) ) launch simultaneously 25 instances of earthdiags.py by providing each time a different namelist with its own MEMBERS variable set to 1, then 2, then 3, ... then 25.

However, when I do this, it looks like EarthDiags can only CMORize one member at a time. Indeed, one member is being CMORized correctly (the last one submitted), but the other ones have the following error:

a0dc_1d_19740101_19740131_grid_T.nc.gz
a0dc_1d_19740101_19740131_icemod.nc.gz
a0dc_1m_19740101_19740131_grid_T.nc.gz
a0dc_1m_19740101_19740131_grid_U.nc.gz
a0dc_1m_19740101_19740131_grid_V.nc.gz
a0dc_1m_19740101_19740131_icemod.nc.gz

Unzipping /scratch/Earth/fmassonn/diags/a0dc/CMOR/a0dc_1m_19740101_19740131_grid_V.nc.gz
gzip: /scratch/Earth/fmassonn/diags/a0dc/CMOR/a0dc_1m_19740101_19740131_grid_V.nc.gz: No such file or directory
Traceback (most recent call last):
  File "./earthdiags.py", line 340, in <module>
    main()
  File "./earthdiags.py", line 336, in main
    EarthDiags.parse_args()
  File "./earthdiags.py", line 113, in parse_args
    diags.run()
  File "./earthdiags.py", line 147, in run
    self.data_manager.prepare()
  File "/home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/cmormanager.py", line 300, in prepare
    self._cmorize_member(startdate, member)
  File "/home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/cmormanager.py", line 321, in _cmorize_member
    cmorizer.cmorize_ocean()
  File "/home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/cmorizer.py", line 62, in cmorize_ocean
    self._cmorize_ocean_files('MMO')
  File "/home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/cmorizer.py", line 73, in _cmorize_ocean_files
    self._unpack_tar_file(tarfile)
  File "/home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/cmorizer.py", line 98, in _unpack_tar_file
    Utils.unzip(glob.glob(os.path.join(self.cmor_scratch, '*.gz')))
  File "/home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/utils.py", line 583, in unzip
    raise Utils.UnzipException('Can not unzip {0}: {1}'.format(filepath, ex))
earthdiagnostics.utils.UnzipException: Can not unzip /scratch/Earth/fmassonn/diags/a0dc/CMOR/a0dc_1m_19740101_19740131_grid_V.nc.gz: ('Error executing {0}\n Return code: {1}', 'gunzip /scratch/Earth/fmassonn/diags/a0dc/CMOR/a0dc_1m_19740101_19740131_grid_V.nc.gz', 2)

I think this is related to the fact that the output file names don't have information about the member (see end of the message above), hence files are probably overriden or removed while they are read by another member.

Do you have an advice on how I could proceed to parallelize my jobs?

The script I'm using is here: /home/Earth/fmassonn/git-stuff/earthdiagnostics/process_a0dc.sh and the logs (per member) are here: /home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/log_1 (for member 1, crashed), /home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/log_2 (for member 2, crashed), /home/Earth/fmassonn/git-stuff/earthdiagnostics/earthdiagnostics/log_3 (for member 3, success).

Many thanks for your help,

François

Assignee
Assign to
Time tracking