1. 24 Feb, 2015 1 commit
    • Morris Jette's avatar
      power/cray development · acdec1f5
      Morris Jette authored
      Update power management web page: Add notes about powering nodes down/up
      Prevent underflow in power distribution logic
      Add logic to identify nodes in "ready" state. Only ready nodes can have
        their power caps modified
      Don't change power cap if node not in ready state
      Various improvements to logging
      Refactor code to eliminate duplicate/repeated building of full NID list
      Plug some memory leaks
      acdec1f5
  2. 23 Feb, 2015 1 commit
    • Morris Jette's avatar
      Fix test for scontrol change · 9cb22140
      Morris Jette authored
      Modify test 12.7 so that we specify a reason when setting a node DOWN
      A recent change to the Slurm code now requires a reason
      9cb22140
  3. 21 Feb, 2015 1 commit
  4. 20 Feb, 2015 5 commits
    • Morris Jette's avatar
      e7c61bdd
    • Morris Jette's avatar
      power/cray work · 82de9635
      Morris Jette authored
      Correct capmc arguments to set power cap.
      Convert "capmc get_node_energy_counter" to use hostlist expressin rather
         than listing every node in a comma separated list.
      Log commands and args run by the plugin via the power_run_script()
         function in src/plugins/power/common/power_common.c.
      Use hostlist to build condenced nid list for power cap set/clear functions.
      82de9635
    • Morris Jette's avatar
      Merge branch 'slurm-14.11' · b8fbbf2b
      Morris Jette authored
      b8fbbf2b
    • Dorian Krause's avatar
      Fix to GRES NoConsume logic · 33c48ac5
      Dorian Krause authored
      we came across the following error message in the slurmctld logs when
      using non-consumable resources:
      
      error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
      is 0
      
      The error comes from _job_dealloc():
      
      node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
      "potion", job_id=46,
          node_name=0x1987ab0 "node1") at gres.c:3980
      (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
      job_id=46,
          node_name=0x1987ab0 "node1") at gres.c:4190
      job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
          at select_linear.c:2091
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
      select_linear.c:3176
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
      exc_core_bitmap=0x0) at select_linear.c:3390
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
      exc_core_bitmap=0x0) at node_select.c:588
      avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
      exc_core_bitmap=0x0)
          at backfill.c:367
      
      The cause of this problem is that _node_state_dup() in gres.c does not
      duplicate the no_consume flag.
      The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
      which calls _node_state_dup().
      
      Below is a simple patch to fix the problem. A "future-proof" alternative
      might be to memcpy() from gres_ptr to new_gres and
      only handle pointers separately.
      33c48ac5
    • Morris Jette's avatar
      d500de54
  5. 19 Feb, 2015 4 commits
  6. 18 Feb, 2015 11 commits
  7. 17 Feb, 2015 11 commits
  8. 14 Feb, 2015 2 commits
  9. 13 Feb, 2015 3 commits
  10. 12 Feb, 2015 1 commit