- 23 Sep, 2016 2 commits
-
-
Tim Wickberg authored
-
Morris Jette authored
Make sure no attempt is made to schedule a requeued job until all steps are cleaned (Node Health Check completes for all steps on a Cray). bug 3082
-
- 22 Sep, 2016 10 commits
-
-
Dominik Bartkiewicz authored
Otherwise limit is checking the node count against the midplane count. Bug 3049.
-
Alejandro Sanchez authored
Check if node names are contiguous with respect to the node list assigned to the partition, rather than just monotonically increasing. Bug 3006.
-
Brian Christiansen authored
xcgroup_delete will print a log message if there is a problem.
-
Janne Blomqvist authored
Bugs 2681 and 2703 Conflicts: NEWS
-
Morris Jette authored
-
Adam Moody authored
-
Adam Moody authored
-
Gennaro Oliva authored
-
Morris Jette authored
-
Alejandro Sanchez authored
license of a certain type.
-
- 21 Sep, 2016 5 commits
-
-
Tim Wickberg authored
-
Morris Jette authored
node_features/knl_cray plugin: Increase default CapmcTimeout parameter from 10 to 60 seconds. bug 3100
-
Morris Jette authored
capmc_suspend/resume - If a request modify NUMA or MCDRAM state on a set of nodes or reboot a set of nodes fails then just requeue the job and abort the entire operation rather than trying to operate on individual nodes. bug 3100
-
Morris Jette authored
Allow a node's PowerUp state flag to be cleared using update_node RPC. bug 3100
-
Morris Jette authored
When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode) then pass to the ResumeProgram the job ID assigned to the nodes in the SLURM_JOB_ID environment variable. bug 3100
-
- 20 Sep, 2016 1 commit
-
-
Morris Jette authored
Don't log error for job end_time being zero if node health check is still running. bug 3053
-
- 19 Sep, 2016 1 commit
-
-
Damien François authored
-
- 17 Sep, 2016 1 commit
-
-
Morris Jette authored
Restore ability to manually power down nodes, broken in 15.08.12 in commit b4904661 The patch introduced in commit b4904661 (not powering down dead node) has a bad side effect. Adding the "(node_ptr->last_idle != 0)" condition prevents from powering down nodes with the following command: scontrol update nodename=nX state=power_down because the state update function relies on zeroing the "last_idle" variable when a power_down is requested (see src/slurmctld/node_mgr.c, line 1589). Reverting this commit should solve the problem...but I let you decide... Didier GAZEN
-
- 16 Sep, 2016 2 commits
-
-
Gennaro Oliva authored
bug 3093
-
Morris Jette authored
node_features/knl_cray: If a node is rebooted outside of Slurm's direction, update it's active features with current MCDRAM and NUMA mode information. bug 3071
-
- 15 Sep, 2016 3 commits
-
-
Morris Jette authored
Fix race condition that could result in MCDRAM state information coming from capmc rather than cnselect (used state for next boot rather than latest boot). bug 3080
-
Tim Wickberg authored
-
Nicolas Joly authored
-
- 14 Sep, 2016 3 commits
-
-
Alejandro Sanchez authored
No functional change, just silencing the warning message in this instance. Bug 3079.
-
Tim Wickberg authored
Fix some whitespace on line endings while here.
-
Alejandro Sanchez authored
Bug 3073.
-
- 12 Sep, 2016 7 commits
-
-
-
Morris Jette authored
-
Morris Jette authored
bug 3065
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Add Slurm overview by Alex on first day, move KNL to second morning, shift roadmap to right after lunch.
-
Morris Jette authored
bug 3065
-
- 09 Sep, 2016 5 commits
-
-
Morris Jette authored
Previous cap was 2 sec (default TCP timeout) times the node count and divided by 1000. A 9000 node job would have the messages spread out over 18 seconds. This change caps the spread at 5 seconds and assumes the normal TCP logic can handle the rest bug 3044
-
Morris Jette authored
If the overhead of determining the hostlist for a given task list is too high, then report a hostlist of "Unknown" instead. If the overhead is too high, then srun will become unresponsive and communications will timeout/fail. bug 3044
-
Morris Jette authored
-
Morris Jette authored
Modify srun task completion handling to only build the task/node string for logging purposes if it is needed. Modified for performance purposes. bug 3044
-
Morris Jette authored
Add get_log_level() function to return the highest LOG_LEVEL_* used for any logging mechanism.
-