- 02 Mar, 2017 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This is a partial reversion of commit 69684648 NOTE: sbatch does not support --cpu_bind (although the documentation does list the option) and the --mem_bind options set SBATCH_* environment variables that nothing every looks at. In other words, it needs some work. Bugs 3519 and 3188
-
Felip Moll authored
bug 3525
-
Morris Jette authored
Copy NEWS item updates from v16.05 applied since v17.02.0 tag to to NEWS for v17.02.1
-
Morris Jette authored
Convert a slurmctd power management data structure from array to list in order to eliminate the possibility of zombie child suspend/resume processes. bug 3516
-
Tim Wickberg authored
This now matches the behavior documented in sbatch. This resolves a problem where the maximum cpu frequency would be set to the minimum available on the node by the batch step. This is due to the batch step leaving cpu_freq_{min,max,gov} uninitialized to zero, which is then translated to a request to set the frequency to the lowest available in the node. This did not impact 16.05 or earlier, as a request for a zero frequency was ignored by a quirk of _cpu_freq_freqspec_num. This quirk was removed by commit f40e1c01 before 17.02.0-rc1. Bug 3510.
-
Morris Jette authored
bug 3516
-
Morris Jette authored
from 10 to 100. bug 3516
-
- 01 Mar, 2017 3 commits
-
-
Alejandro Sanchez authored
-
Danny Auble authored
-
Danny Auble authored
-
- 28 Feb, 2017 4 commits
-
-
Dominik Bartkiewicz authored
information about a job to scontrol.
-
Dominik Bartkiewicz authored
-
Danny Auble authored
cause potential deadlock when/if TRES changed in the database and the slurmctld wasn't made aware of the change. This would be very rare. The lock was originally there to keep new jobs from grabbing the assoc information. If the lock was done afterwards the worst case is we get the new information.
-
Danny Auble authored
It was determined we didn't need the write locks on the job and no locks were needed on the node either. Doing these different locked beforehand would create a window where you could get a config write lock
-
- 27 Feb, 2017 3 commits
-
-
Daniel Letai authored
-
Morris Jette authored
This will be triggered after either a burst buffer job_begin function or select plugin job_begin function fails. Without this change, the "squeue -i" and "scontrol show job" commands can report old job state information. bug 3504
-
Tim Wickberg authored
Burst_buffer/cray - Prevent slurmctld daemon abort if "paths" operation fails. Now job will be held. bug 3504
-
- 24 Feb, 2017 6 commits
-
-
Josko Plazonic authored
bug 3182
-
Tim Shaw authored
-
Tim Shaw authored
-
Don Lipari authored
bug 3473
-
Danny Auble authored
-
Danny Auble authored
-
- 23 Feb, 2017 6 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Danny Auble authored
reason to 32 chars.
-
Morris Jette authored
-
Morris Jette authored
For job resize, correct logic to build "resize" script with new values. Previously the scripts were based upon the original job size. bug 3498
-
Tim Wickberg authored
Do not enable init scripts if not present. Please note that, unlike the init scripts, service files are not automatically enabled at this time. Bug 3371.
-
- 22 Feb, 2017 3 commits
-
-
Morris Jette authored
If node boot in progress when slurmctld daemon is restarted, then allow sufficient time for reboot to complete and not prematurely DOWN the node as "Not responding". bug 3494
-
Morris Jette authored
Could result in squeue abort Coverity error CID 44969
-
Morris Jette authored
Reduces possibility of old data if job_id or user_id option specified with iterate option Coverity error CID 44783
-
- 21 Feb, 2017 1 commit
-
-
Morris Jette authored
Increased maximum file size supported by sbcast from 2 GB (32-bit integer to 64-bits). This required changing the file broadcast RPC and several internal variables. bug 3485
-
- 18 Feb, 2017 2 commits
-
-
Tim Shaw authored
by specifying "--uid=<uid>|-u <uid>". # Conflicts: # NEWS
-
Brian Christiansen authored
A 17.02 controller,sacctmgr couldn't talk to a "master/17.11" DBD because the 17.02 client was talking attempting to talk to the DBD with the 17.02's MIN_PROTOCOL_VERSION -- which was 15.08 and is more than 2 version behind the master. The master's MIN_PROTOCOL_VERSION is 16.05, so it couldn't unpack the messages. The controller should always communicate at it's current protocol to the DBD. For federations, it's possible that a higher version controller could talk to a lower version controller. So the cluster needs to talk to the remote cluster using the remote cluster's protocol version -- which is given back from the DBD.
-
- 17 Feb, 2017 3 commits
-
-
Dominik Bartkiewicz authored
Enable through SchedulerParameters. Will sort by youngest jobs first, rather than based on priority. Use alongside 'preempt_strict_order' if you don't want the plugin to try to further optimize the preemption list. Bug 3457.
-
Dominik Bartkiewicz authored
Introduced by commit 059275f6 when the timer is trigger. Releasing the locks means that job_ptr may point to an element that was deleted by a different thread in the meantime. Restructuring the code to advance the iterator prevents this - the iterator itself does not have this issue as the List structure will manage the position during the sleep(). While here, move the reservation update handling outside of this loop to simplify operation. This does not need to piggy-back on the scan of the job_list - switching to using list_for_each should mitigate some of the performance loss by needing a second full pass. Bug 3414.
-
Tim Wickberg authored
These were mis-calculated previously, and are internal implementation details that weren't meant to be exposed.
-