1. 06 Oct, 2016 3 commits
  2. 05 Oct, 2016 3 commits
    • Morris Jette's avatar
      node_features/knl_cray plugin: streamline node update · db23662d
      Morris Jette authored
      node_features/knl_cray plugin: Substantially streamline and speed up logic
          to load current node state on reconfigure failure or unexpected node boot.
          Completely eliminate capmc calls and just use cnselect to load current
          node mode information.
      db23662d
    • Morris Jette's avatar
      node_features/knl_cray: Validate nodes at start and reconfig · 59b118bf
      Morris Jette authored
      node_features/knl_cray plugin: drain any node not reported by
          "capmc node_status" on startup or reconfig. Also re-tests
          on failed node restart for job.
      59b118bf
    • Morris Jette's avatar
      Remove KNL features from non-KNL node · 38f072ed
      Morris Jette authored
      node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from
          node's configuration if capmc does NOT report the node as being KNL.
          For example, we don't want a non-KNL node with features="quad,cache".
      38f072ed
  3. 04 Oct, 2016 1 commit
    • Morris Jette's avatar
      add knl.conf parameter CapmcRetries · 5cb90497
      Morris Jette authored
      Add new knl.conf configuration parameter CapmcRetries
      Modify capmc_suspend and capmc_resume to retry operations when
        Cray State Manager is down.
      Add retry logic to node_features/knl_cray to handle Cray State
        manager being down.
      bug 3100
      5cb90497
  4. 03 Oct, 2016 1 commit
  5. 30 Sep, 2016 4 commits
  6. 29 Sep, 2016 6 commits
  7. 28 Sep, 2016 1 commit
  8. 27 Sep, 2016 2 commits
  9. 26 Sep, 2016 1 commit
    • Morris Jette's avatar
      Add salloc/sbatch/srun --priority=top option · 62b9884f
      Morris Jette authored
      Add salloc/sbatch/srun --priority option of "TOP" to set job priority to
          the highest possible value. This option is only available to Slurm operators
          and administrators.
      bug 3115
      62b9884f
  10. 24 Sep, 2016 1 commit
  11. 23 Sep, 2016 1 commit
  12. 22 Sep, 2016 5 commits
  13. 21 Sep, 2016 4 commits
  14. 20 Sep, 2016 1 commit
  15. 17 Sep, 2016 1 commit
    • Morris Jette's avatar
      Restore ability to manually power down nodes · da722a89
      Morris Jette authored
      Restore ability to manually power down nodes, broken in 15.08.12
      in commit b4904661
      
      The patch introduced in commit b4904661 (not powering down dead node) has a bad side effect.  Adding the "(node_ptr->last_idle != 0)" condition prevents from powering down nodes with the following command:
      
      scontrol update nodename=nX state=power_down
      
      because the state update function relies on zeroing the "last_idle" variable when a power_down is requested (see src/slurmctld/node_mgr.c, line 1589).
      
      Reverting this commit should solve the problem...but I let you decide...
      
      Didier GAZEN
      da722a89
  16. 16 Sep, 2016 1 commit
    • Morris Jette's avatar
      Update KNL modes for out-of-band reboot · 3a465f80
      Morris Jette authored
      node_features/knl_cray: If a node is rebooted outside of Slurm's direction,
          update it's active features with current MCDRAM and NUMA mode information.
      bug 3071
      3a465f80
  17. 15 Sep, 2016 2 commits
  18. 14 Sep, 2016 2 commits