1. 27 Feb, 2015 6 commits
    • Morris Jette's avatar
      Insure prolog runs on job rqueue · 42dc54ea
      Morris Jette authored
      Set the delay time for job requeue to the job credential lifetime (1200
      second by default). This insures that prolog runs on every node when a
      job is requeued. (This change will slow down launch of re-queued jobs).
      Without this change, if a job is restated within 1200 seconds, the nodes
      previously used would not run the prolog again, since the job ID is
      still seen as active (from the previous execution). It is also advisable
      to set the value of DEFAULT_EXPIRATION_WINDOW in src/common/slurm_cred.c
      to the lowest value reasonable. We need to add a new configuration parameter
      so this is easly changed in the future.
      42dc54ea
    • Morris Jette's avatar
      Merge branch 'slurm-14.11' · cd663c21
      Morris Jette authored
      Conflicts:
      	src/slurmctld/job_mgr.c
      cd663c21
    • Brian Christiansen's avatar
      Display job's estimated NodeCount based off of partition's configured... · ce32018a
      Brian Christiansen authored
      Display job's estimated NodeCount based off of partition's configured resources rather than the whole system's.
      
      Bug 1478
      ce32018a
    • Morris Jette's avatar
      Cosmetic mods, no change in logic · b99fee15
      Morris Jette authored
      b99fee15
    • Morris Jette's avatar
      power/cray - Log cluster-wide power totals · 9e567ecc
      Morris Jette authored
      This provides a better global view of what the limits and caps are.
      9e567ecc
    • Morris Jette's avatar
      power/cray developments · 3ff19460
      Morris Jette authored
      Remove time from "capmc get_node_energy_counter" call. If no recent
        data is available, no data is being returned, so just get latest
        information.
      Initialize a variable to avoid xfree of uninitialized variable.
      Correct joule to watt calculation (">" changed to "<")
      Read configuration once when slurmctld starts rather than twice
      Compute a node's power consumption with more precision based upon
        time to the microsecond
      3ff19460
  2. 26 Feb, 2015 9 commits
  3. 25 Feb, 2015 7 commits
  4. 24 Feb, 2015 10 commits
  5. 23 Feb, 2015 1 commit
    • Morris Jette's avatar
      Fix test for scontrol change · 9cb22140
      Morris Jette authored
      Modify test 12.7 so that we specify a reason when setting a node DOWN
      A recent change to the Slurm code now requires a reason
      9cb22140
  6. 21 Feb, 2015 1 commit
  7. 20 Feb, 2015 5 commits
    • Morris Jette's avatar
      e7c61bdd
    • Morris Jette's avatar
      power/cray work · 82de9635
      Morris Jette authored
      Correct capmc arguments to set power cap.
      Convert "capmc get_node_energy_counter" to use hostlist expressin rather
         than listing every node in a comma separated list.
      Log commands and args run by the plugin via the power_run_script()
         function in src/plugins/power/common/power_common.c.
      Use hostlist to build condenced nid list for power cap set/clear functions.
      82de9635
    • Morris Jette's avatar
      Merge branch 'slurm-14.11' · b8fbbf2b
      Morris Jette authored
      b8fbbf2b
    • Dorian Krause's avatar
      Fix to GRES NoConsume logic · 33c48ac5
      Dorian Krause authored
      we came across the following error message in the slurmctld logs when
      using non-consumable resources:
      
      error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
      is 0
      
      The error comes from _job_dealloc():
      
      node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
      "potion", job_id=46,
          node_name=0x1987ab0 "node1") at gres.c:3980
      (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
      job_id=46,
          node_name=0x1987ab0 "node1") at gres.c:4190
      job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
          at select_linear.c:2091
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
      select_linear.c:3176
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
      exc_core_bitmap=0x0) at select_linear.c:3390
      bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
          preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
      exc_core_bitmap=0x0) at node_select.c:588
      avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
      exc_core_bitmap=0x0)
          at backfill.c:367
      
      The cause of this problem is that _node_state_dup() in gres.c does not
      duplicate the no_consume flag.
      The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
      which calls _node_state_dup().
      
      Below is a simple patch to fix the problem. A "future-proof" alternative
      might be to memcpy() from gres_ptr to new_gres and
      only handle pointers separately.
      33c48ac5
    • Morris Jette's avatar
      d500de54
  8. 19 Feb, 2015 1 commit