- 17 Aug, 2018 16 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
This is not a valid response here - backup and primary must always be running the same version, so do not attempt to handle this here.
-
Tim Wickberg authored
There's no point in pinging controllers with a lower priority than yourself - they'll already be pinging you. As we did nothing with that data, don't bother to collect it, especially as lower priority controllers being unavailable will delay the next pass through this loop.
-
Tim Wickberg authored
Reference backup_inx directly after startup, and exit much earlier if this host is not a valid controller. Return a non-zero exit code in this situation as well.
-
Tim Wickberg authored
-
Tim Wickberg authored
Collapse into a single function so we can appropriately warn if a mix of options are in use. This also avoids a confusing-looking xmalloc with the count padded by two, which was being used to build out space for ControlMachine if SlurmctldHost was not defined. This would have also masked off a series of off-by-one errors, and has lead to attempts to connect to 0.0.0.0 instead of a segfault. (Some code was intentionally using this over-provisioning as a way to treat this as a NULL-terminated list, but this was then technically incorrect in cases where the old-style BackupController was set since the NULL would happen at the third position in the array, which is an invalid memory access.)
-
Tim Wickberg authored
And document why these are handled the way they are here.
-
Tim Wickberg authored
This results in an out-of-bounds access (if control_machine was not being intentionally over-alloced to avoid it), the wrong address, and other subtle problems. C's order of operations meant this was resolving as: i = (_backup_index() != -1); which is either 0 or 1. Through sheer luck, this still results in the correct answer for the primary (_backup_index() is -1, and then i = (-1 != -1) is still 0 which is correct), and first backup controller (_backup_index() is 1, and then i = (1 != -1) is still 1 which is also correct), but any further backups controllers will end up with the address of the first backup.
-
Tim Wickberg authored
Use the already-established slurmctld_primary bool instead.
-
Tim Wickberg authored
The second NULL check is a duplicate of the first. The first check is also unnecessary - this field existing is managed by the control_cnt variable. (At one point in development control_cnt did not exist, and control_machine was a 0-terminated array instead.)
-
Tim Wickberg authored
Otherwise cross-architecture failover will break in confusing ways.
-
Tim Wickberg authored
Bug 5256.
-
Danny Auble authored
plugin. Bug 5583
-
Brian Christiansen authored
Caught by verify_lock() annotation in job_submit_plugin_modify(). Bug 5578.
-
Brian Christiansen authored
Caught by verify_lock() annotations in validate_job_create_req() and job_allocate() respectively. Bug 5578.
-
Danny Auble authored
Brian approved
-
- 16 Aug, 2018 24 commits
-
-
Danny Auble authored
Brian approved
-
Danny Auble authored
Brian approved
-
Danny Auble authored
This is only to quite coverity. I don't think this is a real problem. Brian approved.
-
Danny Auble authored
Fix coverity 187747
-
Danny Auble authored
-
Tim Wickberg authored
-
Danny Auble authored
-
Danny Auble authored
(i.e. gres=gpu/tesla) it would get a count of 0.
-
Brian Christiansen authored
Bug 5570
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Dominik Bartkiewicz authored
Note that pipe2() is Linux-specific, but this whole cgroup plugin is already Linux-specific in design, and the eventfd() call itself is Linux-specific as well. Bug 5570.
-
Danny Auble authored
-
Danny Auble authored
Didn't realize centos 6 was so far behind.
-
Felip Moll authored
Bug5503
-
Brian Christiansen authored
-
Danny Auble authored
-
Danny Auble authored
-
Brian Christiansen authored
"IN" tres_usage is reads "OUT" tres_usage is writes page faults are only "ins"
-
Brian Christiansen authored
In 18.08 the jobacct_info stats were changed to tres_usage arrays which are init'ed to INFINITE64 but < 18.08 it was init'ed to 0's. Bug 5554
-
Brian Christiansen authored
-
Brian Christiansen authored
In 18.08, usage values are stored as bytes but in <18.08 values were stored in KB, MB, etc. Also handle cputime adjustments. Related commit: efc161ef.
-
Brian Christiansen authored
In 18.08, VMSIZE is stored as bytes and not kilos anymore. Related commit: c9448c80
-
Brian Christiansen authored
Since cputime isn't stored a double anymore, but as a uin64_t, the cputime is adjusted so that precision isn't lost. Related commit: efc161ef
-