- 26 Sep, 2019 2 commits
-
-
Marshall Garey authored
Bug 7499
-
Dominik Bartkiewicz authored
Regression introduced in fb26b706. Bug 7675
-
- 25 Sep, 2019 1 commit
-
-
Albert Gil authored
Now the signaling of the batch step and the handeling of the flags is totally handled in _kill_all_active_steps() in slurmd, and _handle_signal_container() in stepd to ensure that: - if KILL_JOB_BATCH then only batch container is signaled - if KILL_FULL_JOB then batch script and its children are also signaled - if both of the above then only the batch script and its children are signaled We do not relay anymore on proctrack_g_signal() to handle the batch step signaling anymore, therefore it works the same for all proctrack plugins. This commit also includes minor related fixes in other code handling such signaling flags, and documentation improvement. Bug 7282
-
- 23 Sep, 2019 1 commit
-
-
Tim Wickberg authored
-
- 20 Sep, 2019 2 commits
-
-
Brian Christiansen authored
Signed-off-by: Tim Wickberg <tim@schedmd.com> Bug 7697
-
Michael Hinton authored
1cd43fce Bug 7630
-
- 16 Sep, 2019 1 commit
-
-
Robert Tweedy authored
Bug 7727 This was missed in commit 6ac4ce84.
-
- 12 Sep, 2019 3 commits
-
-
Marcin Stolarek authored
An incorrect logic with the variables holding available cores in the gres_plugin_job_core_filter3() function lead to a potential infinite "while (avail_cores_tot > req_cores)" loop, leaving slurmctld unresponsive. Bug 7685.
-
Brian Christiansen authored
Bug 7719 Signed-off-by: Danny Auble <da@schedmd.com>
-
Dominik Bartkiewicz authored
Regression caused by 72736af2. Bug 7708.
-
- 10 Sep, 2019 1 commit
-
-
Danny Auble authored
FastSchedule will be removed in 20.02. FastSchedule=2 functionality has been moved to SlurmdParameters=config_overrides. Bug 7496. Signed-off-by: Tim Wickberg <tim@schedmd.com>
-
- 06 Sep, 2019 2 commits
-
-
Brian Christiansen authored
Bug 7699
-
Danny Auble authored
Continuation of 64876087 Bug 7698
-
- 04 Sep, 2019 3 commits
-
-
Danny Auble authored
Bug 4781
-
Dominik Bartkiewicz authored
Otherwise, there could be time frames where printed schednodes information could be obsolete. Bug 7676.
-
Dominik Bartkiewicz authored
exclusively to that job. Bug 7510
-
- 03 Sep, 2019 4 commits
-
-
Dominik Bartkiewicz authored
use correct start_time for TIME_FLOAT resevation in _job_overlap() Bug 7458
-
Dominik Bartkiewicz authored
Bug 7458
-
Dominik Bartkiewicz authored
Bug 7458
-
Dominik Bartkiewicz authored
Move _validate_node_choice() before prolog/epilog check Bug 7458
-
- 29 Aug, 2019 4 commits
-
-
Michael Hinton authored
Free the gres_devices list to avoid a valgrind warning on exit. Bug 7644.
-
Brian Christiansen authored
Continuation of 30bbc11d Bug 7445 Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
-
Brian Christiansen authored
When --batch=<feature> is used, the batch_host isn't chosen until the job is being launched -- because the features could be different on boot (e.g. KNL nodes). Thus if the job is allocated nodes that need to be booted, it needs to wait till they are all booted so it can make a decision at launch time. Bug 7445 Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
-
Dominik Bartkiewicz authored
This is a continuation to 7da439b4 Bug 7445
-
- 28 Aug, 2019 1 commit
-
-
Alejandro Sanchez authored
Only do so once the task actually finishes. Otherwise, a requeued task could set an incorrect max_exit_code even if completed with exit code 0 after re-running again, leading to problems with i.e. other jobs with an afterok type of dependency on such array relying on the incorrectly set max_exit_code. Bug 7552.
-
- 23 Aug, 2019 1 commit
-
-
Marcin Stolarek authored
In case of features like cpu&fastio&[knl|westmere] additional bit_or resulted in returning something like (cpu&fastio)|knl|westmere, which is obviously wrong. XOR/XAND features are handled properly in _get_req_features. Bug 7378
-
- 20 Aug, 2019 2 commits
-
-
Danny Auble authored
Handle situation where a slurmctld tries to communicate with slurmdbd more than once at the same time. What can happen here is the slurmdbd/slurmctld connection gets hung up somehow. If the slurmctld is restarted a new connection is made along side the old connection. When the old connection gets unwedged the old connection will clear out the registration of the slurmctld making it so no updates are sent to that slurmctld. What this does is checks for old connections when a registration message comes in. If we find one we print error set the rem_port = 0 and remove it from the list. This makes it so when it gets unwedged we just close the socket instead of remove the registration. Bug 5213
-
Alejandro Sanchez authored
Bug 7360.
-
- 19 Aug, 2019 2 commits
-
-
Danny Auble authored
in track scripts code. Bug 7360 Signed-off-by: Alejandro Sanchez <alex@schedmd.com>
-
Broderick Gardner authored
The implementation of priority_p_job_end in priority/multifactor expects the job state to be set to complete or completing in order to properly remove some job usage from the assoc and qos. This must be simulated by the pack job run check code, or the check-time usage is not removed. Bug 7284
-
- 16 Aug, 2019 1 commit
-
-
Chad Vizino authored
It wasn't properly set under certain conditions. Bug 7276
-
- 15 Aug, 2019 1 commit
-
-
Marcin Stolarek authored
Bug 7410.
-
- 14 Aug, 2019 8 commits
-
-
Morris Jette authored
Consider jobs in COMPLETING state as being available immediatley for a job will-run evaluation. This assumes the completion will happen very soon after the test is run. bug 6769
-
Morris Jette authored
All of the select plugins were performing a duplicate resource free for jobs in completing state when performing a will-run test for new jobs. This would frequently result in underflow messages. Bug 6769
-
Marshall Garey authored
Building off the prevoius commit, spank_option_getopt() is now valid in more functions than before, so we document and enforce from where spank_option_getopt() can safely be called and return ESPANK_NOT_AVAIL if it is called from any invalid SPANK context. Bug 7065.
-
Marshall Garey authored
When spank option callbacks are called, the options are added to a cache in memory so that spank_option_getopt() can retrieve the options when called. However, this was only happening when callbacks were called from the local context, so we make sure that the options are added to the cache when the callbacks are called from the remote context as well. Bug 7065.
-
Dominik Bartkiewicz authored
job_test_resv() - return ESLURM_RESERVATION_MAINT if all nodes from the job partition are in a maintenance reservation. Bug 7362.
-
Dominik Bartkiewicz authored
If job requests feature(s) but the required nodes are in an advanced reservation. Bug 7362.
-
Dominik Bartkiewicz authored
Previously the UnavailableNodes list could include nodes that don't belong to the job partition. Bug 7362.
-
Marshall Garey authored
The old proctrack/cray plugin was changed to proctrack/cray_aries. Continuation of c6e6089f Bug 6824.
-