1. 10 Oct, 2014 5 commits
    • Dorian Krause's avatar
      Job step memory allocation logic fix · 0dd12469
      Dorian Krause authored
      This commit fixes a bug we observed when combining select/linear with
      gres. If an allocation was requested with a --gres argument an srun
      execution within that allocation would stall indefinitely:
      
      -bash-4.1$ salloc -N 1 --gres=gpfs:100
      salloc: Granted job allocation 384049
      bash-4.1$ srun -w j3c017 -n 1 hostname
      srun: Job step creation temporarily disabled, retrying
      
      The slurmctld log showed:
      
      debug3: StepDesc: user_id=10034 job_id=384049 node_count=1-1 cpu_count=1
      debug3:    cpu_freq=4294967294 num_tasks=1 relative=65534 task_dist=1 node_list=j3c017
      debug3:    host=j3l02 port=33608 name=hostname network=(null) exclusive=0
      debug3:    checkpoint-dir=/home/user checkpoint_int=0
      debug3:    mem_per_node=62720 resv_port_cnt=65534 immediate=0 no_kill=0
      debug3:    overcommit=0 time_limit=0 gres=(null) constraints=(null)
      debug:  Configuration for job 384049 complete
      _pick_step_nodes: some requested nodes j3c017 still have memory used by other steps
      _slurm_rpc_job_step_create for job 384049: Requested nodes are busy
      
      If srun --exclusive would have be used instead everything would work fine.
      The reason is that in exclusive mode the code properly checks whether memory
      is a reserved resource in the _pick_step_node() function.
      This commit modifies the alternate code path to do the same.
      0dd12469
    • Danny Auble's avatar
      SLURMDBD - Only set the archive flag if purging the object · 686cd117
      Danny Auble authored
      (i.e ArchiveJobs PurgeJobs).  This is only a cosmetic change.
      686cd117
    • Nicolas Joly's avatar
      13a91611
    • Danny Auble's avatar
    • Danny Auble's avatar
      27338987
  2. 09 Oct, 2014 2 commits
  3. 08 Oct, 2014 3 commits
  4. 07 Oct, 2014 4 commits
  5. 04 Oct, 2014 1 commit
  6. 03 Oct, 2014 4 commits
  7. 02 Oct, 2014 3 commits
  8. 30 Sep, 2014 1 commit
  9. 29 Sep, 2014 5 commits
  10. 26 Sep, 2014 1 commit
  11. 25 Sep, 2014 2 commits
  12. 24 Sep, 2014 3 commits
  13. 23 Sep, 2014 1 commit
  14. 22 Sep, 2014 2 commits
  15. 19 Sep, 2014 2 commits
  16. 18 Sep, 2014 1 commit