1. 02 Mar, 2015 1 commit
  2. 27 Feb, 2015 1 commit
  3. 26 Feb, 2015 1 commit
  4. 24 Feb, 2015 4 commits
  5. 20 Feb, 2015 1 commit
    • Dorian Krause's avatar
      Fix to GRES NoConsume logic · 33c48ac5
      Dorian Krause authored
      we came across the following error message in the slurmctld logs when
      using non-consumable resources:
      
      error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
      is 0
      
      The error comes from _job_dealloc():
      
      node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
      "potion", job_id=46,
          node_name=0x1987ab0 "node1") at gres.c:3980
      (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
      job_id=46,
          node_name=0x1987ab0 "node1") at gres.c:4190
      job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
          at select_linear.c:2091
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
      select_linear.c:3176
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
      exc_core_bitmap=0x0) at select_linear.c:3390
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
      exc_core_bitmap=0x0) at node_select.c:588
      avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
      exc_core_bitmap=0x0)
          at backfill.c:367
      
      The cause of this problem is that _node_state_dup() in gres.c does not
      duplicate the no_consume flag.
      The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
      which calls _node_state_dup().
      
      Below is a simple patch to fix the problem. A "future-proof" alternative
      might be to memcpy() from gres_ptr to new_gres and
      only handle pointers separately.
      33c48ac5
  6. 19 Feb, 2015 2 commits
  7. 18 Feb, 2015 2 commits
  8. 17 Feb, 2015 2 commits
  9. 13 Feb, 2015 1 commit
  10. 12 Feb, 2015 3 commits
  11. 11 Feb, 2015 1 commit
  12. 10 Feb, 2015 2 commits
  13. 09 Feb, 2015 3 commits
  14. 05 Feb, 2015 1 commit
  15. 04 Feb, 2015 3 commits
    • Morris Jette's avatar
      Report correct job "shared" field value · 3de14946
      Morris Jette authored
      Previously it was not possible to distinguish between a job needing
      exclusive nodes and the default job/partition configuration.
      3de14946
    • Morris Jette's avatar
      job array slurmctld abort fix · 0ff342b5
      Morris Jette authored
      Fix job array logic that can cause slurmctld to abort.
      bug 1426
      0ff342b5
    • Morris Jette's avatar
      Fix for CUDA v7.0+ · da2fba48
      Morris Jette authored
      Enable CUDA v7.0+ use with a Slurm configuration of TaskPlugin=task/cgroup
      ConstrainDevices=yes (in cgroup.conf). With that configuration
      CUDA_VISIBLE_DEVICES will start at 0 rather than the device number.
      bug 1421
      da2fba48
  16. 03 Feb, 2015 6 commits
  17. 02 Feb, 2015 3 commits
  18. 31 Jan, 2015 2 commits
  19. 30 Jan, 2015 1 commit