- 12 Oct, 2016 6 commits
-
-
Tim Wickberg authored
Cannot use ClusterName without reading a config file that may not exist. Bug 3026.
-
Tim Wickberg authored
This introduced an inadvertent dependency on the config file, which does not exist when setting up a new cluster. Bug 3026. This reverts commit c39f9ac9.
-
Morris Jette authored
task/affinity plugin: Honor a job's --ntasks-per-socket and --ntasks-per-core options in task binding. bug 3118
-
Morris Jette authored
Preserve non-KNL node features when updating the KNL node features for a multi-node job in which the non-KNL node features vary by node.
-
Morris Jette authored
node_features/knl_cray plugin: If the reconfiguration of nodes for an interactive job fails, kill the job (it can't be requeued like a batch job).
-
Morris Jette authored
node_features/knl_cray plugin: Add separate thread to interact with capmc in response to unexpected node reboots. bug 3153
-
- 11 Oct, 2016 6 commits
-
-
Alejandro Sanchez authored
bug 3091
-
Morris Jette authored
Prevent possible divide by zero in select/cons_res if a node's board count is higher than it's socket count. bug 3155
-
Morris Jette authored
If a node's socket or core count are changed at registration time (e.g. a KNL node's NUMA mode is changed), change it's board count to match. bug 3155
-
Morris Jette authored
Cray: The slurmd can manipulate the socket/core/thread values reported based upon the configuration. The logic failed to consider select/cray with SelectTypeParameters=other_cons_res as equivalent to select/cons_res. bug 3155
-
Tim Wickberg authored
abs() should not be used on long long variables as it would truncate if strictly confirming to C99. Use llabs() instead. Fix to commit 2aefc66b.
-
Tim Wickberg authored
-
- 07 Oct, 2016 1 commit
-
-
Morris Jette authored
-
- 06 Oct, 2016 5 commits
-
-
Danny Auble authored
always just AccountingPolicy.
-
Danny Auble authored
-
Danny Auble authored
configure.
-
Alejandro Sanchez authored
Bug 3124.
-
Morris Jette authored
node_features plugin - Add "mode" argument to node_features_p_node_xlate() function to fix some bugs updating a node's features using the node update RPC. Without this change it is impossible to clear the active features of a node or reset non-KNL node features.
-
- 05 Oct, 2016 3 commits
-
-
Morris Jette authored
node_features/knl_cray plugin: Substantially streamline and speed up logic to load current node state on reconfigure failure or unexpected node boot. Completely eliminate capmc calls and just use cnselect to load current node mode information.
-
Morris Jette authored
node_features/knl_cray plugin: drain any node not reported by "capmc node_status" on startup or reconfig. Also re-tests on failed node restart for job.
-
Morris Jette authored
node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from node's configuration if capmc does NOT report the node as being KNL. For example, we don't want a non-KNL node with features="quad,cache".
-
- 04 Oct, 2016 1 commit
-
-
Morris Jette authored
Add new knl.conf configuration parameter CapmcRetries Modify capmc_suspend and capmc_resume to retry operations when Cray State Manager is down. Add retry logic to node_features/knl_cray to handle Cray State manager being down. bug 3100
-
- 03 Oct, 2016 1 commit
-
-
Dominik Bartkiewicz authored
-
- 30 Sep, 2016 4 commits
-
-
Alejandro Sanchez authored
Otherwise they'll truncate when packed into the RPC and end up as some bizarre value at the controller. Bug 3098.
-
Dominik Bartkiewicz authored
Set completed time for pending/running runaway jobs to the max of (start, eligible, submit) times. Bug 3075
-
Artem Polyakov authored
Avoid using slurm_forward_data because it causes thread spawn that introduces unwanted delays. Bug 3102.
-
Tim Wickberg authored
-
- 29 Sep, 2016 6 commits
-
-
Morris Jette authored
-
Alejandro Sanchez authored
Also correct the value of NICE_OFFSET used within the perl API. Bug 3098.
-
Artem Polyakov authored
Bug 3051.
-
Tim Wickberg authored
Otherwise updates would be rejected for running jobs even if there would be no impact. Most common when the job_submit plugin is overriding QOS/GRES values on everything; without this change an update to "comment" or other fields would fail with ESLURM_JOB_NOT_PENDING. Bug 3117.
-
Tim Wickberg authored
Never ever run NHC, even on an edge case that NHC_NO would still launch NHC after. Bug 3105.
-
Tim Wickberg authored
Switch to list_for_each, and check if access list actually changed after each update before updating last_prat_update. This prevents the backfill scheduler from resetting mid-cycle unnecessarily. Bug 3123.
-
- 28 Sep, 2016 1 commit
-
-
Morris Jette authored
Add "sbatch_wait_nodes" to SchedulerParameters to control default sbatch behaviour with respect to waiting for all allocated nodes to be ready for use. Job can override the configuration option using the --wait-all-nodes=# option. bug 3120
-
- 27 Sep, 2016 2 commits
-
-
Morris Jette authored
Prior logic would treat execute line like this: $ sbatch --wait-all-nodes -N3 tmp with "-N3" as being the argument to the "--wait-all-nodes" option. See bug 3120
-
Morris Jette authored
Add salloc/sbatch/srun option --use-min-nodes to prefer smaller node counts when a range of node counts is specified (e.g. "-N 2-4"). bug 2996
-
- 26 Sep, 2016 1 commit
-
-
Morris Jette authored
Add salloc/sbatch/srun --priority option of "TOP" to set job priority to the highest possible value. This option is only available to Slurm operators and administrators. bug 3115
-
- 24 Sep, 2016 1 commit
-
-
Morris Jette authored
bug 3090
-
- 23 Sep, 2016 1 commit
-
-
Morris Jette authored
Make sure no attempt is made to schedule a requeued job until all steps are cleaned (Node Health Check completes for all steps on a Cray). bug 3082
-
- 22 Sep, 2016 1 commit
-
-
Dominik Bartkiewicz authored
Otherwise limit is checking the node count against the midplane count. Bug 3049.
-