- 29 Sep, 2016 2 commits
-
-
Tim Wickberg authored
Switch to list_for_each, and check if access list actually changed after each update before updating last_prat_update. This prevents the backfill scheduler from resetting mid-cycle unnecessarily. Bug 3123.
-
Morris Jette authored
Add protocol version to slurmd startup communications for slurmstepd to permit changes in the protocol.
-
- 28 Sep, 2016 3 commits
-
-
Morris Jette authored
Add "flag" field to launch_tasks_request_msg. Remove the following fields (moved into flags): multi_prog, task_flags, user_managed_io, pty, buffered_stdio, and labelio. More flags to be added later.
-
Tim Wickberg authored
Remove from build system, and delete L/P specific files. Run autogen.sh as well.
-
Morris Jette authored
Add "sbatch_wait_nodes" to SchedulerParameters to control default sbatch behaviour with respect to waiting for all allocated nodes to be ready for use. Job can override the configuration option using the --wait-all-nodes=# option. bug 3120
-
- 27 Sep, 2016 2 commits
-
-
Morris Jette authored
Prior logic would treat execute line like this: $ sbatch --wait-all-nodes -N3 tmp with "-N3" as being the argument to the "--wait-all-nodes" option. See bug 3120
-
Morris Jette authored
Add salloc/sbatch/srun option --use-min-nodes to prefer smaller node counts when a range of node counts is specified (e.g. "-N 2-4"). bug 2996
-
- 26 Sep, 2016 1 commit
-
-
Morris Jette authored
Add salloc/sbatch/srun --priority option of "TOP" to set job priority to the highest possible value. This option is only available to Slurm operators and administrators. bug 3115
-
- 24 Sep, 2016 2 commits
-
-
Morris Jette authored
bug 3090
-
Morris Jette authored
Make sure no attempt is made to schedule a requeued job until all steps are cleaned (Node Health Check completes for all steps on a Cray). bug 3082
-
- 23 Sep, 2016 1 commit
-
-
Morris Jette authored
Make sure no attempt is made to schedule a requeued job until all steps are cleaned (Node Health Check completes for all steps on a Cray). bug 3082
-
- 22 Sep, 2016 6 commits
-
-
Dominik Bartkiewicz authored
Otherwise limit is checking the node count against the midplane count. Bug 3049.
-
Alejandro Sanchez authored
Check if node names are contiguous with respect to the node list assigned to the partition, rather than just monotonically increasing. Bug 3006.
-
Tim Wickberg authored
-
Janne Blomqvist authored
Bugs 2681 and 2703 Conflicts: NEWS
-
Adam Moody authored
-
Alejandro Sanchez authored
license of a certain type.
-
- 21 Sep, 2016 8 commits
-
-
Morris Jette authored
node_features/knl_cray plugin: Increase default CapmcTimeout parameter from 10 to 60 seconds. bug 3100
-
Morris Jette authored
capmc_suspend/resume - If a request modify NUMA or MCDRAM state on a set of nodes or reboot a set of nodes fails then just requeue the job and abort the entire operation rather than trying to operate on individual nodes. bug 3100
-
Morris Jette authored
Allow a node's PowerUp state flag to be cleared using update_node RPC. bug 3100
-
Morris Jette authored
When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode) then pass to the ResumeProgram the job ID assigned to the nodes in the SLURM_JOB_ID environment variable. bug 3100
-
Morris Jette authored
Don't log error for job end_time being zero if node health check is still running. bug 3053
-
Morris Jette authored
capmc_suspend/resume - If a request modify NUMA or MCDRAM state on a set of nodes or reboot a set of nodes fails then just requeue the job and abort the entire operation rather than trying to operate on individual nodes. bug 3100
-
Morris Jette authored
Allow a node's PowerUp state flag to be cleared using update_node RPC. bug 3100
-
Morris Jette authored
When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode) then pass to the ResumeProgram the job ID assigned to the nodes in the SLURM_JOB_ID environment variable. bug 3100
-
- 20 Sep, 2016 1 commit
-
-
Morris Jette authored
Don't log error for job end_time being zero if node health check is still running. bug 3053
-
- 17 Sep, 2016 2 commits
-
-
Danny Auble authored
the same logic that was found in the slurmdbd. Now both functionalities share the same code. This was done with the merge right before this commit.
-
Morris Jette authored
Restore ability to manually power down nodes, broken in 15.08.12 in commit b4904661 The patch introduced in commit b4904661 (not powering down dead node) has a bad side effect. Adding the "(node_ptr->last_idle != 0)" condition prevents from powering down nodes with the following command: scontrol update nodename=nX state=power_down because the state update function relies on zeroing the "last_idle" variable when a power_down is requested (see src/slurmctld/node_mgr.c, line 1589). Reverting this commit should solve the problem...but I let you decide... Didier GAZEN
-
- 16 Sep, 2016 1 commit
-
-
Morris Jette authored
node_features/knl_cray: If a node is rebooted outside of Slurm's direction, update it's active features with current MCDRAM and NUMA mode information. bug 3071
-
- 15 Sep, 2016 3 commits
-
-
Tim Wickberg authored
Will be appended to usernames if --mail-user is not explicitly set for the job and email notifications are requested. Bug 3089.
-
Morris Jette authored
Fix race condition that could result in MCDRAM state information coming from capmc rather than cnselect (used state for next boot rather than latest boot). bug 3080
-
Nicolas Joly authored
-
- 14 Sep, 2016 2 commits
-
-
Alejandro Sanchez authored
No functional change, just silencing the warning message in this instance. Bug 3079.
-
Alejandro Sanchez authored
Bug 3073.
-
- 12 Sep, 2016 1 commit
-
-
Tim Wickberg authored
-
- 09 Sep, 2016 3 commits
-
-
Morris Jette authored
Modify srun task completion handling to only build the task/node string for logging purposes if it is needed. Modified for performance purposes. bug 3044
-
Tim Wickberg authored
This reverts commit 1ec2a4ae.
-
Alejandro Sanchez authored
Bug 3063.
-
- 08 Sep, 2016 2 commits
-
-
Brian Christiansen authored
In scontrol show nodes.
-
Morris Jette authored
Restructure srun command locking for task_exit processing logic for improved parallelism. This change decreases the amount of time consumed by serial logic by 2 orders of magnitude. bug 3044
-