- 29 Aug, 2019 12 commits
-
-
Alejandro Sanchez authored
-
Albert Gil authored
Bug 7149.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Continuations of a2f2894f Bug 7445 Signed-off-by: Marshall Garey <marshall@schedmd.com>
-
Brian Christiansen authored
Continuation of 30bbc11d Bug 7445 Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
-
Brian Christiansen authored
Bug 7445 Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
-
Brian Christiansen authored
When --batch=<feature> is used, the batch_host isn't chosen until the job is being launched -- because the features could be different on boot (e.g. KNL nodes). Thus if the job is allocated nodes that need to be booted, it needs to wait till they are all booted so it can make a decision at launch time. Bug 7445 Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
-
Dominik Bartkiewicz authored
This is a continuation to 7da439b4 Bug 7445
-
Alejandro Sanchez authored
-
Albert Gil authored
The get_default_acct was truncating the account name for long account names. This commit uses -P/--parsable2 to avoid it. Bug 7369.
-
Marcin Stolarek authored
Bug 7467.
-
- 28 Aug, 2019 1 commit
-
-
Alejandro Sanchez authored
Only do so once the task actually finishes. Otherwise, a requeued task could set an incorrect max_exit_code even if completed with exit code 0 after re-running again, leading to problems with i.e. other jobs with an afterok type of dependency on such array relying on the incorrectly set max_exit_code. Bug 7552.
-
- 26 Aug, 2019 3 commits
-
-
Danny Auble authored
We only remove from registered_clusters if conn->rem_port != 0, so only add to it if the same. Bug 5213 Signed-off-by: Alejandro Sanchez <alex@schedmd.com>
-
Tim Wickberg authored
-
Marshall Garey authored
The previous log message implied that you should never use the topology plugin where no switch could reach all nodes through its descendants. However, this is a valid configuration where sites may not want jobs spanning across certain switches, so we've softened the language in the log message. Bug 7466.
-
- 23 Aug, 2019 3 commits
-
-
Marcin Stolarek authored
In case of features like cpu&fastio&[knl|westmere] additional bit_or resulted in returning something like (cpu&fastio)|knl|westmere, which is obviously wrong. XOR/XAND features are handled properly in _get_req_features. Bug 7378
-
Marcin Stolarek authored
Display nodenames instead of bitmap ranges
-
Marcin Stolarek authored
We changed FAQ in 4cea931c we replaced stop/start of slurmd with just restart, but the example now suggest to use systemctl start which will actually do nothing in case of started slurmd.
-
- 20 Aug, 2019 2 commits
-
-
Danny Auble authored
Handle situation where a slurmctld tries to communicate with slurmdbd more than once at the same time. What can happen here is the slurmdbd/slurmctld connection gets hung up somehow. If the slurmctld is restarted a new connection is made along side the old connection. When the old connection gets unwedged the old connection will clear out the registration of the slurmctld making it so no updates are sent to that slurmctld. What this does is checks for old connections when a registration message comes in. If we find one we print error set the rem_port = 0 and remove it from the list. This makes it so when it gets unwedged we just close the socket instead of remove the registration. Bug 5213
-
Alejandro Sanchez authored
Bug 7360.
-
- 19 Aug, 2019 6 commits
-
-
Danny Auble authored
in track scripts code. Bug 7360 Signed-off-by: Alejandro Sanchez <alex@schedmd.com>
-
Broderick Gardner authored
The implementation of priority_p_job_end in priority/multifactor expects the job state to be set to complete or completing in order to properly remove some job usage from the assoc and qos. This must be simulated by the pack job run check code, or the check-time usage is not removed. Bug 7284
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 7428
-
Marcin Stolarek authored
Use of scontrol wait_job in slurmctld will result in prolog hanging since the command will complete only when PrologSlurmctld is completed. It's a deadlock. Bug 7428.
-
Marcin Stolarek authored
-
- 16 Aug, 2019 2 commits
-
-
Chad Vizino authored
It wasn't properly set under certain conditions. Bug 7276
-
Marcin Stolarek authored
"ANY" is the canonical and most accurate value identifier for PARTITION_ENFORCE_ANY although "Yes", "Up", "True" and "1" continue being parsed and accepted as equivalent values for retro-compatibility purposes with the initial commit edf3880c. Bug 7248.
-
- 15 Aug, 2019 2 commits
-
-
Marcin Stolarek authored
Bug 7410.
-
Dominik Bartkiewicz authored
Continuation of 884c0191. Bug 7362.
-
- 14 Aug, 2019 9 commits
-
-
Danny Auble authored
-
Morris Jette authored
Bug 6769
-
Morris Jette authored
Consider jobs in COMPLETING state as being available immediatley for a job will-run evaluation. This assumes the completion will happen very soon after the test is run. bug 6769
-
Morris Jette authored
All of the select plugins were performing a duplicate resource free for jobs in completing state when performing a will-run test for new jobs. This would frequently result in underflow messages. Bug 6769
-
Ben Roberts authored
-
Ben Roberts authored
-
Ben Roberts authored
-
Ben Roberts authored
-
Ben Roberts authored
-