- 07 Oct, 2016 2 commits
-
-
Morris Jette authored
Previous logic would result in sub-optimal job allocations and under some conditions invalid memory references resulting in the slurmctld daemon crashing or aborting. bug 2732
-
Morris Jette authored
-
- 06 Oct, 2016 5 commits
-
-
Danny Auble authored
always just AccountingPolicy.
-
Danny Auble authored
-
Danny Auble authored
configure.
-
Alejandro Sanchez authored
Bug 3124.
-
Morris Jette authored
node_features plugin - Add "mode" argument to node_features_p_node_xlate() function to fix some bugs updating a node's features using the node update RPC. Without this change it is impossible to clear the active features of a node or reset non-KNL node features.
-
- 05 Oct, 2016 6 commits
-
-
Morris Jette authored
node_features/knl_cray plugin: drain any node not reported by "capmc node_status" on startup or reconfig. Also re-tests on failed node restart for job.
-
Morris Jette authored
node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from node's configuration if capmc does NOT report the node as being KNL. For example, we don't want a non-KNL node with features="quad,cache".
-
Morris Jette authored
Add new knl.conf configuration parameter CapmcRetries Modify capmc_suspend and capmc_resume to retry operations when Cray State Manager is down. Add retry logic to node_features/knl_cray to handle Cray State manager being down. bug 3100
-
Morris Jette authored
node_features/knl_cray plugin: Substantially streamline and speed up logic to load current node state on reconfigure failure or unexpected node boot. Completely eliminate capmc calls and just use cnselect to load current node mode information.
-
Morris Jette authored
node_features/knl_cray plugin: drain any node not reported by "capmc node_status" on startup or reconfig. Also re-tests on failed node restart for job.
-
Morris Jette authored
node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from node's configuration if capmc does NOT report the node as being KNL. For example, we don't want a non-KNL node with features="quad,cache".
-
- 04 Oct, 2016 1 commit
-
-
Morris Jette authored
Add new knl.conf configuration parameter CapmcRetries Modify capmc_suspend and capmc_resume to retry operations when Cray State Manager is down. Add retry logic to node_features/knl_cray to handle Cray State manager being down. bug 3100
-
- 03 Oct, 2016 1 commit
-
-
Dominik Bartkiewicz authored
-
- 30 Sep, 2016 8 commits
-
-
Alejandro Sanchez authored
Otherwise they'll truncate when packed into the RPC and end up as some bizarre value at the controller. Bug 3098.
-
Dominik Bartkiewicz authored
Set completed time for pending/running runaway jobs to the max of (start, eligible, submit) times. Bug 3075
-
Morris Jette authored
Added new SchedulerParameters options step_retry_count and step_retry_time to control scheduling behaviour of job steps waiting for resources. bug 3121
-
Morris Jette authored
This change can reduce RPCs sent by srun up to 50%.
-
Artem Polyakov authored
Avoid using slurm_forward_data because it causes thread spawn that introduces unwanted delays. Bug 3102.
-
Tim Wickberg authored
-
Morris Jette authored
THis can happen due to signal/retry logic and the result is vestigial srun ports that are no longer valid and slurmctld sending messages to invalid ports to notify them about available resources as other steps complete
-
Morris Jette authored
-
- 29 Sep, 2016 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Alejandro Sanchez authored
Also correct the value of NICE_OFFSET used within the perl API. Bug 3098.
-
Artem Polyakov authored
Bug 3051.
-
Tim Wickberg authored
Otherwise updates would be rejected for running jobs even if there would be no impact. Most common when the job_submit plugin is overriding QOS/GRES values on everything; without this change an update to "comment" or other fields would fail with ESLURM_JOB_NOT_PENDING. Bug 3117.
-
Tim Wickberg authored
Never ever run NHC, even on an edge case that NHC_NO would still launch NHC after. Bug 3105.
-
Tim Wickberg authored
Switch to list_for_each, and check if access list actually changed after each update before updating last_prat_update. This prevents the backfill scheduler from resetting mid-cycle unnecessarily. Bug 3123.
-
Tim Wickberg authored
Switch to list_for_each, and check if access list actually changed after each update before updating last_prat_update. This prevents the backfill scheduler from resetting mid-cycle unnecessarily. Bug 3123.
-
Morris Jette authored
Add protocol version to slurmd startup communications for slurmstepd to permit changes in the protocol.
-
- 28 Sep, 2016 3 commits
-
-
Morris Jette authored
Add "flag" field to launch_tasks_request_msg. Remove the following fields (moved into flags): multi_prog, task_flags, user_managed_io, pty, buffered_stdio, and labelio. More flags to be added later.
-
Tim Wickberg authored
Remove from build system, and delete L/P specific files. Run autogen.sh as well.
-
Morris Jette authored
Add "sbatch_wait_nodes" to SchedulerParameters to control default sbatch behaviour with respect to waiting for all allocated nodes to be ready for use. Job can override the configuration option using the --wait-all-nodes=# option. bug 3120
-
- 27 Sep, 2016 2 commits
-
-
Morris Jette authored
Prior logic would treat execute line like this: $ sbatch --wait-all-nodes -N3 tmp with "-N3" as being the argument to the "--wait-all-nodes" option. See bug 3120
-
Morris Jette authored
Add salloc/sbatch/srun option --use-min-nodes to prefer smaller node counts when a range of node counts is specified (e.g. "-N 2-4"). bug 2996
-
- 26 Sep, 2016 1 commit
-
-
Morris Jette authored
Add salloc/sbatch/srun --priority option of "TOP" to set job priority to the highest possible value. This option is only available to Slurm operators and administrators. bug 3115
-
- 24 Sep, 2016 2 commits
-
-
Morris Jette authored
bug 3090
-
Morris Jette authored
Make sure no attempt is made to schedule a requeued job until all steps are cleaned (Node Health Check completes for all steps on a Cray). bug 3082
-