- 18 Aug, 2018 2 commits
-
-
Danny Auble authored
# Conflicts: # testsuite/expect/test1.76
-
Danny Auble authored
Bug 5584
-
- 17 Aug, 2018 20 commits
-
-
Morris Jette authored
1. The cpu frequency set by the user is not exact with current kernels, but close. This changes the logic accordingly. 2. The original logic would cause the test to hang indefinitely if the submitted job never ends. This adds timeout checks on the job wait, plus adds a 1 minute time limit on the job. 3. Improve/simplify the parsing logic. Bug 5584
-
Brian Christiansen authored
Currently only valid nextstate states are down and resume/idle so the node shouldn't be in a drain state after transitioning into either of these states. Bug 5544
-
Tim Wickberg authored
This also clears up a potential race around ping_thread_cnt as it was protected by ping_mutex in one location and shutdown_mutex in another.
-
Tim Wickberg authored
- Do not look it up again in the backup controller. - In the backup controller, stop comparing it at all and instead use the backup_idx value to decide if we're outselves.
-
Tim Wickberg authored
-
Tim Wickberg authored
This is not a valid response here - backup and primary must always be running the same version, so do not attempt to handle this here.
-
Tim Wickberg authored
There's no point in pinging controllers with a lower priority than yourself - they'll already be pinging you. As we did nothing with that data, don't bother to collect it, especially as lower priority controllers being unavailable will delay the next pass through this loop.
-
Tim Wickberg authored
Reference backup_inx directly after startup, and exit much earlier if this host is not a valid controller. Return a non-zero exit code in this situation as well.
-
Tim Wickberg authored
-
Tim Wickberg authored
Collapse into a single function so we can appropriately warn if a mix of options are in use. This also avoids a confusing-looking xmalloc with the count padded by two, which was being used to build out space for ControlMachine if SlurmctldHost was not defined. This would have also masked off a series of off-by-one errors, and has lead to attempts to connect to 0.0.0.0 instead of a segfault. (Some code was intentionally using this over-provisioning as a way to treat this as a NULL-terminated list, but this was then technically incorrect in cases where the old-style BackupController was set since the NULL would happen at the third position in the array, which is an invalid memory access.)
-
Tim Wickberg authored
And document why these are handled the way they are here.
-
Tim Wickberg authored
This results in an out-of-bounds access (if control_machine was not being intentionally over-alloced to avoid it), the wrong address, and other subtle problems. C's order of operations meant this was resolving as: i = (_backup_index() != -1); which is either 0 or 1. Through sheer luck, this still results in the correct answer for the primary (_backup_index() is -1, and then i = (-1 != -1) is still 0 which is correct), and first backup controller (_backup_index() is 1, and then i = (1 != -1) is still 1 which is also correct), but any further backups controllers will end up with the address of the first backup.
-
Tim Wickberg authored
Use the already-established slurmctld_primary bool instead.
-
Tim Wickberg authored
The second NULL check is a duplicate of the first. The first check is also unnecessary - this field existing is managed by the control_cnt variable. (At one point in development control_cnt did not exist, and control_machine was a 0-terminated array instead.)
-
Tim Wickberg authored
Otherwise cross-architecture failover will break in confusing ways.
-
Tim Wickberg authored
Bug 5256.
-
Danny Auble authored
plugin. Bug 5583
-
Brian Christiansen authored
Caught by verify_lock() annotation in job_submit_plugin_modify(). Bug 5578.
-
Brian Christiansen authored
Caught by verify_lock() annotations in validate_job_create_req() and job_allocate() respectively. Bug 5578.
-
Danny Auble authored
Brian approved
-
- 16 Aug, 2018 18 commits
-
-
Danny Auble authored
Brian approved
-
Danny Auble authored
Brian approved
-
Danny Auble authored
This is only to quite coverity. I don't think this is a real problem. Brian approved.
-
Danny Auble authored
Fix coverity 187747
-
Danny Auble authored
-
Tim Wickberg authored
-
Danny Auble authored
-
Danny Auble authored
(i.e. gres=gpu/tesla) it would get a count of 0.
-
Brian Christiansen authored
Bug 5570
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Dominik Bartkiewicz authored
Note that pipe2() is Linux-specific, but this whole cgroup plugin is already Linux-specific in design, and the eventfd() call itself is Linux-specific as well. Bug 5570.
-
Danny Auble authored
-
Danny Auble authored
Didn't realize centos 6 was so far behind.
-
Felip Moll authored
Bug5503
-
Brian Christiansen authored
-
Danny Auble authored
-
Danny Auble authored
-