- 05 Dec, 2018 12 commits
-
-
Tim Wickberg authored
-
Felip Moll authored
When bf_continue is set, and locks are released during a backfill cycle, other operations can make new resorces available while part way through the queue. When backfill continues the cycle and evaluates new jobs, it may allocate some of these newly available resources to lower priority jobs, rather than to higher priority jobs that were already considered in this backfill cycle. This patch introduces bf_ignore_newly_avail_nodes to SchedulerParameters to solve this issue. This option will ignore nodes made available when the backfill scheduler yields when resuming the backfill cycle. Bug 5279.
-
Morris Jette authored
No new code, just a stubbed module
-
Michael Hinton authored
Bug 6018 Also see bug 5402
-
Danny Auble authored
-
Morris Jette authored
Previously the gres/gpu plugin was removing non-gpu records from the gres list. Here is a specific example: Sample gres.conf file Name=mps Type=fake Count=1 File=/dev/tty0 Name=mps Type=fake Count=1 File=/dev/tty1 Name=gpu Type=fake Count=1 File=/dev/tty0 Name=gpu Type=fake Count=1 File=/dev/tty1 The gres/mps counts were being reported to slurmctld as having a zero count.
-
Danny Auble authored
slurmd yet delivering it's TRES list. Bug 6122 Co-authored-by: Marshall Garey <marshall@schedmd.com>
-
Morris Jette authored
-
Morris Jette authored
-
Alejandro Sanchez authored
Bug 5835.
-
Nate Rini authored
Bug 6008
-
Danny Auble authored
# Conflicts: # slurm/slurm_errno.h # src/common/slurm_errno.c
-
- 04 Dec, 2018 14 commits
-
-
Nate Rini authored
Bug 6008
-
Morris Jette authored
then an error is generated if more than one of those specifications contains KNL NUMA or MCDRAM modes. Bug 5846
-
Morris Jette authored
Bug 5846
-
Morris Jette authored
are down nodes. Bug 5846
-
Morris Jette authored
NODE_SET_REBOOT to continue. Bug 5846
-
Morris Jette authored
node change when possible. Bug 5846
-
Tim Wickberg authored
-
Tim Wickberg authored
Break out a list of Linux distributions as well.
-
Marshall Garey authored
Plugins reading in their own config files rely on the SLURM_CONF environment variable pointing to the appropriate directory, otherwise they will fall back to the build in sysconfdir path. Set the environment variable early enough so that the -f flag operates correctly, but not before conf->conffile has definitely been set. Remove the setenv call that happens before the first slurmstepd is fork()'d as it is now redundant. Bug 4774.
-
Alejandro Sanchez authored
sbatch sets these, but salloc did not. This should make srun behavior between the two consistent. Bug 3861.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
While here fixup function declarations to match current style guide. No functional changes.
-
Tim Wickberg authored
While here fixup function declarations to match current style guide. No functional changes.
-
- 03 Dec, 2018 5 commits
-
-
Marshall Garey authored
time that wasn't existent instead of just updating lines that have time with a lesser time.
-
Dominik Bartkiewicz authored
Slurm is going to replace internally. Bug 5800
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
slurmstepd exclusively accepts API connections through a unix socket. Before this patch, the client end (usually slurmd, but pam_slurm_adopt and scontrol both can use this) retrieves an auth cred via MUNGE, serializes that over the socket, after which the slurmstepd must send that crential back to MUNGE for verification. However, the only info used from that cred is the uid from the client side of the socket. That info can be retrieved via SO_PEERCRED (on Linux) - this is what MUNGE uses to authenticate its own credentials. And the client uid is only checked in half of the API calls since the info exposed is not considered sensitive. So, rather than have every slurmd -> slurmstepd call involve a sequence of: slurmd -> MUNGE for cred (authenticated using SO_PEERCRED internally) slurmd -> slurmstepd over socket slurmstepd -> MUNGE to validate credential This can be simplified to: slurmd -> slurmstepd over socket (auth using SO_PEERCRED directly) This simplified call path removes two socket connections, plus the overhead from MUNGE's cryptographic operations, from the exchange. While performance is not criticial for slurmd -> slurmstepd communication, this also improves performance for other system utilities such as pam_slurm_adopt (which needs to connect to half of the extern stepds on the node on average), or a future nss_slurm module which is expected to place an even higher load on this API. The one caveat here is that the API was not built in a way that makes this restructing easy. The slurmstepd protocol version, which may be one or two release behind that of the slurmd, was only sent back to the slurmd _after_ the auth cred has been received and validated. So, to handle backwards compatibility, we change over to sending the SLURM_PROTOCOL_VERSION instead of SOCKET_CONNECT as the first int over the socket. If the slurmstepd returns an error - since this value is not equal to SOCKET_CONNECT (zero) as was required in older versions - we allow that connection to close, and try to reconnect using the older RPC format instead. That fallback code should be removed two versions after 19.05 is released.
-
- 02 Dec, 2018 2 commits
-
-
Tim Wickberg authored
Use __func__, and list the function name first in the message. Drop one redundant message printing the request number - all paths through the switch statement will print this out in some form. Remove a ternary used to print SLURM_SUCCESS/SLURM_FAILURE and print the raw return value. If you're staring at debug3 logs, you should hopefully know how to interpret these values. :)
-
Tim Wickberg authored
Not used, so don't bother retrieving it from the cred in _handle_accept. Also, switch a printf format to %u instead of %d for uid_t.
-
- 29 Nov, 2018 5 commits
-
-
Morris Jette authored
-
Dominik Bartkiewicz authored
Bug 6121
-
Morris Jette authored
-
Nate Rini authored
Bug 6008
-
Morris Jette authored
bug 6078
-
- 28 Nov, 2018 2 commits
-
-
Morris Jette authored
This change permits the set of the type_ arrays in the job gres data structure without setup of the typo_ arrays. Bug 6078
-
Morris Jette authored
-