- 26 Oct, 2016 4 commits
-
-
Morris Jette authored
Fix bug that was clearing MAINT mode on nodes scheduled for reboot (bug introduced in version 16.05.5 to address bug in overlapping reservations, commit 5eee1d28). Note that a node's MAINT flag is used for both a requested reboot and maintenance reservation. What I'd like to do is add a new node state flag to differenciate between these two cases, but that involves some significant changes that could introduce instability, so it will be defered to version 17.02 bug 3210
-
Danny Auble authored
-
Danny Auble authored
requested with -n tasks < hosts from -w hostlist.
-
Morris Jette authored
bug 2149
-
- 25 Oct, 2016 3 commits
-
-
Dominik Bartkiewicz authored
Bug 3194
-
Morris Jette authored
Document that node Weight takes precedence over load with LLN scheduling. bug 3204
-
Tim Wickberg authored
task/cray's _get_numa_nodes() function needs to run before task/cgroup cleans up the cgroup hierarchies, otherwise ALPS memory compaction will never run. Also move task_p_add_pid() outside the #ifdef HAVE_NATIVE_CRAY block so that the plugin will load (albeit without any functionality) on non-Cray systems for testing purposes. Revise documentation and provided slurm.conf templates as well. Bug 3154.
-
- 20 Oct, 2016 2 commits
-
-
Tim Wickberg authored
_select_nodes_parts() was resetting state_reason to an admin hold without regard to admin vs user hold state. state_reason is the only place that user vs. admin is distinguished, so this prevented users from releasing these jobs. Bug introduced by commit fb46c84b in 16.05.5. Bug 3197.
-
Danny Auble authored
This is an addition to commit cb7ed937
-
- 19 Oct, 2016 1 commit
-
-
Ole H Nielsen authored
bug 3191
-
- 18 Oct, 2016 2 commits
-
-
Dominik Bartkiewicz authored
Improve reported estimates of start and end times for pending jobs. bug 3184
-
Morris Jette authored
Cray: Prevent abort in backfill scheduling logic for requeued job that has been cancelled while NHC is running. bug 3185
-
- 17 Oct, 2016 1 commit
-
-
Danny Auble authored
new glibc 2.24+ that depricates readdir_r.
-
- 13 Oct, 2016 3 commits
-
-
Morris Jette authored
Added node_features/knl_generic plugin for KNL support on non-Cray systems. NOTE: This plugin is still under development.
-
Morris Jette authored
Do not propagate SLURM_UMASK environment variable to batch script. bug 2609
-
Bjørn-Helge Mevik authored
Correct a bitmap test function (used only by the select/bluegene plugin). The effect of this bug is probably very limited as it will in almost all cases revert prematurely to a bit-by-bit test rather than using a full-word test. bug 3145
-
- 12 Oct, 2016 6 commits
-
-
Tim Wickberg authored
Cannot use ClusterName without reading a config file that may not exist. Bug 3026.
-
Tim Wickberg authored
This introduced an inadvertent dependency on the config file, which does not exist when setting up a new cluster. Bug 3026. This reverts commit c39f9ac9.
-
Morris Jette authored
task/affinity plugin: Honor a job's --ntasks-per-socket and --ntasks-per-core options in task binding. bug 3118
-
Morris Jette authored
Preserve non-KNL node features when updating the KNL node features for a multi-node job in which the non-KNL node features vary by node.
-
Morris Jette authored
node_features/knl_cray plugin: If the reconfiguration of nodes for an interactive job fails, kill the job (it can't be requeued like a batch job).
-
Morris Jette authored
node_features/knl_cray plugin: Add separate thread to interact with capmc in response to unexpected node reboots. bug 3153
-
- 11 Oct, 2016 6 commits
-
-
Alejandro Sanchez authored
bug 3091
-
Morris Jette authored
Prevent possible divide by zero in select/cons_res if a node's board count is higher than it's socket count. bug 3155
-
Morris Jette authored
If a node's socket or core count are changed at registration time (e.g. a KNL node's NUMA mode is changed), change it's board count to match. bug 3155
-
Morris Jette authored
Cray: The slurmd can manipulate the socket/core/thread values reported based upon the configuration. The logic failed to consider select/cray with SelectTypeParameters=other_cons_res as equivalent to select/cons_res. bug 3155
-
Tim Wickberg authored
abs() should not be used on long long variables as it would truncate if strictly confirming to C99. Use llabs() instead. Fix to commit 2aefc66b.
-
Tim Wickberg authored
-
- 07 Oct, 2016 1 commit
-
-
Morris Jette authored
-
- 06 Oct, 2016 5 commits
-
-
Danny Auble authored
always just AccountingPolicy.
-
Danny Auble authored
-
Danny Auble authored
configure.
-
Alejandro Sanchez authored
Bug 3124.
-
Morris Jette authored
node_features plugin - Add "mode" argument to node_features_p_node_xlate() function to fix some bugs updating a node's features using the node update RPC. Without this change it is impossible to clear the active features of a node or reset non-KNL node features.
-
- 05 Oct, 2016 3 commits
-
-
Morris Jette authored
node_features/knl_cray plugin: Substantially streamline and speed up logic to load current node state on reconfigure failure or unexpected node boot. Completely eliminate capmc calls and just use cnselect to load current node mode information.
-
Morris Jette authored
node_features/knl_cray plugin: drain any node not reported by "capmc node_status" on startup or reconfig. Also re-tests on failed node restart for job.
-
Morris Jette authored
node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from node's configuration if capmc does NOT report the node as being KNL. For example, we don't want a non-KNL node with features="quad,cache".
-
- 04 Oct, 2016 1 commit
-
-
Morris Jette authored
Add new knl.conf configuration parameter CapmcRetries Modify capmc_suspend and capmc_resume to retry operations when Cray State Manager is down. Add retry logic to node_features/knl_cray to handle Cray State manager being down. bug 3100
-
- 03 Oct, 2016 1 commit
-
-
Dominik Bartkiewicz authored
-
- 30 Sep, 2016 1 commit
-
-
Alejandro Sanchez authored
Otherwise they'll truncate when packed into the RPC and end up as some bizarre value at the controller. Bug 3098.
-