- 21 Aug, 2018 5 commits
-
-
Tim Wickberg authored
scan-build complains about this possibly being used uninitialized with pthread_join(), but on review the path it identifies as causing that is impossible. Initialize this anyways in the hopes of surppressing that warning.
-
Tim Wickberg authored
-
Tim Wickberg authored
This also randomizes the backup controller port numbers which were not being handled here for some reason.
-
Dominik Bartkiewicz authored
Bug 5166
-
Tim Wickberg authored
Bug 5426
-
- 20 Aug, 2018 15 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Change _build_path to return the xmalloc()'d string, and make sure to free it where appropriate. Add comment about how the processing of the PATH environment variable here is backwards from convention.
-
Tim Wickberg authored
-
Tim Wickberg authored
Fixes GCC 8.2.0 warnings.
-
Tim Wickberg authored
Fix some -Werror=restrict and -Werror=format= issues.
-
Tim Wickberg authored
-
Tim Wickberg authored
Silence GCC 8.2.0 warning.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Nothing modifies this, so don't bother making a copy of it. Related to GCC 8.2.0 compilation fixes.
-
Tim Wickberg authored
Fixes GCC 8.2.0 compilation warnings. Bug 5465.
-
Tim Wickberg authored
-
Michael Hinton authored
MySQL permits up to 64-character database names, but Slurm was truncating at 33-characters. If we exceed this limit, let the mysql_query fail and give the admin a chance to sort it out, rather than truncating and then failing to query against the un-truncated name later on. While here correct the fatal() message. Bug 5586.
-
Tim Wickberg authored
-
- 18 Aug, 2018 5 commits
-
-
Tim Wickberg authored
Partial revert of 5b5228d0. Calling init() on certain plugins is not safe. Remove the option, documentation, and the test case for this. See 19fc9d94 and the reversion in 61f04d67 as well. Bug 3445.
-
Danny Auble authored
-
Brian Christiansen authored
Bug 5554
-
Danny Auble authored
# Conflicts: # testsuite/expect/test1.76
-
Danny Auble authored
Bug 5584
-
- 17 Aug, 2018 15 commits
-
-
Morris Jette authored
1. The cpu frequency set by the user is not exact with current kernels, but close. This changes the logic accordingly. 2. The original logic would cause the test to hang indefinitely if the submitted job never ends. This adds timeout checks on the job wait, plus adds a 1 minute time limit on the job. 3. Improve/simplify the parsing logic. Bug 5584
-
Brian Christiansen authored
Currently only valid nextstate states are down and resume/idle so the node shouldn't be in a drain state after transitioning into either of these states. Bug 5544
-
Tim Wickberg authored
This also clears up a potential race around ping_thread_cnt as it was protected by ping_mutex in one location and shutdown_mutex in another.
-
Tim Wickberg authored
- Do not look it up again in the backup controller. - In the backup controller, stop comparing it at all and instead use the backup_idx value to decide if we're outselves.
-
Tim Wickberg authored
-
Tim Wickberg authored
This is not a valid response here - backup and primary must always be running the same version, so do not attempt to handle this here.
-
Tim Wickberg authored
There's no point in pinging controllers with a lower priority than yourself - they'll already be pinging you. As we did nothing with that data, don't bother to collect it, especially as lower priority controllers being unavailable will delay the next pass through this loop.
-
Tim Wickberg authored
Reference backup_inx directly after startup, and exit much earlier if this host is not a valid controller. Return a non-zero exit code in this situation as well.
-
Tim Wickberg authored
-
Tim Wickberg authored
Collapse into a single function so we can appropriately warn if a mix of options are in use. This also avoids a confusing-looking xmalloc with the count padded by two, which was being used to build out space for ControlMachine if SlurmctldHost was not defined. This would have also masked off a series of off-by-one errors, and has lead to attempts to connect to 0.0.0.0 instead of a segfault. (Some code was intentionally using this over-provisioning as a way to treat this as a NULL-terminated list, but this was then technically incorrect in cases where the old-style BackupController was set since the NULL would happen at the third position in the array, which is an invalid memory access.)
-
Tim Wickberg authored
And document why these are handled the way they are here.
-
Tim Wickberg authored
This results in an out-of-bounds access (if control_machine was not being intentionally over-alloced to avoid it), the wrong address, and other subtle problems. C's order of operations meant this was resolving as: i = (_backup_index() != -1); which is either 0 or 1. Through sheer luck, this still results in the correct answer for the primary (_backup_index() is -1, and then i = (-1 != -1) is still 0 which is correct), and first backup controller (_backup_index() is 1, and then i = (1 != -1) is still 1 which is also correct), but any further backups controllers will end up with the address of the first backup.
-
Tim Wickberg authored
Use the already-established slurmctld_primary bool instead.
-
Tim Wickberg authored
The second NULL check is a duplicate of the first. The first check is also unnecessary - this field existing is managed by the control_cnt variable. (At one point in development control_cnt did not exist, and control_machine was a 0-terminated array instead.)
-
Tim Wickberg authored
Otherwise cross-architecture failover will break in confusing ways.
-