- 20 Aug, 2018 26 commits
-
-
Morris Jette authored
correction to logic for explicit hostname specification on job submit bug introduced in commit 0e4874e19490a24fb54961ef89176a3e8f55952b
-
Morris Jette authored
also add a regression test for this scheduling logic bug 4584
-
Morris Jette authored
Add that desired GPU count is actually allocated to a job based upon --gpus, --gpus-per-node, --gpus-per-socket, and --gpus-per-task options
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
this bug exists with all select plugins. if a job has been allocated gres and the gres have either topology or type information and the slurmctld daemon restarts (while the job is running), then when the job ends gres underflow errors will be generated. the problem is due to the slurmctld not having gres topology or type information available at restart time so that it can not update counters. the overhead of updating those counters at node registration time is high, so we just avoid generating the errors in this case. note: this bug is not specific to cons_tres and exists in earlier versions of slurm.
-
Morris Jette authored
-
Morris Jette authored
if the step does not explicity specify a gres-per-node value, then the step will be allocated gres identical to that allocated to the job
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
also relocate tres_mc create/destroy location so information can be access from additional locations and to reduce overhead of creating it multiple times
-
Morris Jette authored
-
Morris Jette authored
it is not going to work in practice
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Modify gres_plugin_job_alloc() to allocate pre-selected GRES. Add fields to job GRES data structure to define selected GRES before job allocation time.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Michael Hinton authored
MySQL permits up to 64-character database names, but Slurm was truncating at 33-characters. If we exceed this limit, let the mysql_query fail and give the admin a chance to sort it out, rather than truncating and then failing to query against the un-truncated name later on. While here correct the fatal() message. Bug 5586.
-
Tim Wickberg authored
-
- 18 Aug, 2018 6 commits
-
-
Tim Wickberg authored
Partial revert of 5b5228d0. Calling init() on certain plugins is not safe. Remove the option, documentation, and the test case for this. See 19fc9d94 and the reversion in 61f04d67 as well. Bug 3445.
-
Danny Auble authored
-
Brian Christiansen authored
Bug 5554
-
Morris Jette authored
-
Danny Auble authored
# Conflicts: # testsuite/expect/test1.76
-
Danny Auble authored
Bug 5584
-
- 17 Aug, 2018 8 commits
-
-
Morris Jette authored
1. The cpu frequency set by the user is not exact with current kernels, but close. This changes the logic accordingly. 2. The original logic would cause the test to hang indefinitely if the submitted job never ends. This adds timeout checks on the job wait, plus adds a 1 minute time limit on the job. 3. Improve/simplify the parsing logic. Bug 5584
-
Brian Christiansen authored
Currently only valid nextstate states are down and resume/idle so the node shouldn't be in a drain state after transitioning into either of these states. Bug 5544
-
Tim Wickberg authored
-
Tim Wickberg authored
This also clears up a potential race around ping_thread_cnt as it was protected by ping_mutex in one location and shutdown_mutex in another.
-
Tim Wickberg authored
- Do not look it up again in the backup controller. - In the backup controller, stop comparing it at all and instead use the backup_idx value to decide if we're outselves.
-
Tim Wickberg authored
-
Tim Wickberg authored
This is not a valid response here - backup and primary must always be running the same version, so do not attempt to handle this here.
-
Tim Wickberg authored
There's no point in pinging controllers with a lower priority than yourself - they'll already be pinging you. As we did nothing with that data, don't bother to collect it, especially as lower priority controllers being unavailable will delay the next pass through this loop.
-