1. 27 Feb, 2015 2 commits
    • Morris Jette's avatar
      Insure prolog runs on job rqueue · 42dc54ea
      Morris Jette authored
      Set the delay time for job requeue to the job credential lifetime (1200
      second by default). This insures that prolog runs on every node when a
      job is requeued. (This change will slow down launch of re-queued jobs).
      Without this change, if a job is restated within 1200 seconds, the nodes
      previously used would not run the prolog again, since the job ID is
      still seen as active (from the previous execution). It is also advisable
      to set the value of DEFAULT_EXPIRATION_WINDOW in src/common/slurm_cred.c
      to the lowest value reasonable. We need to add a new configuration parameter
      so this is easly changed in the future.
      42dc54ea
    • Brian Christiansen's avatar
      Display job's estimated NodeCount based off of partition's configured... · ce32018a
      Brian Christiansen authored
      Display job's estimated NodeCount based off of partition's configured resources rather than the whole system's.
      
      Bug 1478
      ce32018a
  2. 26 Feb, 2015 2 commits
  3. 25 Feb, 2015 1 commit
  4. 24 Feb, 2015 4 commits
  5. 20 Feb, 2015 2 commits
    • Morris Jette's avatar
      e7c61bdd
    • Dorian Krause's avatar
      Fix to GRES NoConsume logic · 33c48ac5
      Dorian Krause authored
      we came across the following error message in the slurmctld logs when
      using non-consumable resources:
      
      error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
      is 0
      
      The error comes from _job_dealloc():
      
      node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
      "potion", job_id=46,
          node_name=0x1987ab0 "node1") at gres.c:3980
      (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
      job_id=46,
          node_name=0x1987ab0 "node1") at gres.c:4190
      job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
          at select_linear.c:2091
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
      select_linear.c:3176
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
      exc_core_bitmap=0x0) at select_linear.c:3390
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
      exc_core_bitmap=0x0) at node_select.c:588
      avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
      exc_core_bitmap=0x0)
          at backfill.c:367
      
      The cause of this problem is that _node_state_dup() in gres.c does not
      duplicate the no_consume flag.
      The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
      which calls _node_state_dup().
      
      Below is a simple patch to fix the problem. A "future-proof" alternative
      might be to memcpy() from gres_ptr to new_gres and
      only handle pointers separately.
      33c48ac5
  6. 19 Feb, 2015 2 commits
  7. 18 Feb, 2015 5 commits
  8. 17 Feb, 2015 2 commits
  9. 14 Feb, 2015 1 commit
  10. 13 Feb, 2015 1 commit
  11. 12 Feb, 2015 4 commits
  12. 11 Feb, 2015 2 commits
  13. 10 Feb, 2015 2 commits
  14. 09 Feb, 2015 4 commits
  15. 06 Feb, 2015 2 commits
  16. 05 Feb, 2015 1 commit
  17. 04 Feb, 2015 3 commits