Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • autosubmit autosubmit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 338
    • Issues 338
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 21
    • Merge requests 21
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth SciencesEarth Sciences
  • autosubmitautosubmit
  • Issues
  • #837
Closed
Open
Issue created Jun 16, 2022 by Eric Ferrer@eferre1Developer

Autosubmit doesn't update the status and keeps running

Hi @dbeltran, I have an experiment that didn't update the status of the job and kept it as queuing but it had already ran (and failed):

4 of 7 jobs remaining (12:07)
Job a3n3_19500101_fc0_1_SIM is QUEUING


4 of 7 jobs remaining (12:08)
Job a3n3_19500101_fc0_1_SIM is QUEUING

(···)

4 of 7 jobs remaining (12:18)
Job a3n3_19500101_fc0_1_SIM is QUEUING


4 of 7 jobs remaining (12:19)
Command squeue -j 23552014, -o %A,%R in mn1.bsc.es warning: slurm_load_jobs error: Invalid job id specified

Job a3n3_19500101_fc0_1_SIM is QUEUING

But when looking at the LOG_a3n3 at the remote, the logs of the run where there (I had to move them manually to not be overwritten) and the last update was from 11 minutes before the squeue command error:

-rw-r--r-- 1 bsc32627 bsc32 307063 Jun 15 10:11 a3n3_19500101_fc0_1_SIM.20220615100956.err
-rw-r--r-- 1 bsc32627 bsc32 408195 Jun 15 10:11 a3n3_19500101_fc0_1_SIM.20220615100956.out
-rw-r--r-- 1 bsc32627 bsc32 693938 Jun 16 12:07 a3n3_19500101_fc0_1_SIM.cmd.out_bckp
-rw-r--r-- 1 bsc32627 bsc32 295969 Jun 16 12:07 a3n3_19500101_fc0_1_SIM.cmd.err_bckp
-rwxr-xr-x 1 bsc32627 bsc32  27648 Jun 16 12:33 a3n3_19500101_fc0_1_SIM.cmd

From the looks of it, AS wasn't able to get the RUNNING status from the remote, and then keeps looking for the same jobid while it is running, and once it fails (or finishes) it can't find the ID, but the autosubmit run keeps going since it thinks that the job is still in QUEUE until I stopped it manually. After that, using the autosubmit setstatus and autosubmit run commands it creates the job again with a new ID (I didn't test to use the run command without setting the job back to waiting).

@mcastril

Assignee
Assign to
Time tracking