1. 04 Dec, 2018 8 commits
  2. 03 Dec, 2018 5 commits
    • Marshall Garey's avatar
      When handling runaway jobs remove all usage before rollup to remove any · bf705c80
      Marshall Garey authored
      time that wasn't existent instead of just updating lines that have time
      with a lesser time.
      bf705c80
    • Dominik Bartkiewicz's avatar
      Fix issue when job's environment is minimal and only contains variables · f1116c67
      Dominik Bartkiewicz authored
      Slurm is going to replace internally.
      
      Bug 5800
      f1116c67
    • Tim Wickberg's avatar
      Remove a few missed references to ionodes. · aec91c8b
      Tim Wickberg authored
      aec91c8b
    • Tim Wickberg's avatar
      Fix long line from previous commit. · bd600cc7
      Tim Wickberg authored
      bd600cc7
    • Tim Wickberg's avatar
      Rework slurmstepd authentication. · 06863788
      Tim Wickberg authored
      slurmstepd exclusively accepts API connections through a unix socket.
      Before this patch, the client end (usually slurmd, but pam_slurm_adopt and
      scontrol both can use this) retrieves an auth cred via MUNGE, serializes
      that over the socket, after which the slurmstepd must send that crential
      back to MUNGE for verification.
      
      However, the only info used from that cred is the uid from the client side
      of the socket. That info can be retrieved via SO_PEERCRED (on Linux) - this
      is what MUNGE uses to authenticate its own credentials. And the client uid
      is only checked in half of the API calls since the info exposed is not
      considered sensitive.
      
      So, rather than have every slurmd -> slurmstepd call involve a sequence of:
      
          slurmd -> MUNGE for cred (authenticated using SO_PEERCRED internally)
          slurmd -> slurmstepd over socket
          slurmstepd -> MUNGE to validate credential
      
      This can be simplified to:
          slurmd -> slurmstepd over socket (auth using SO_PEERCRED directly)
      
      This simplified call path removes two socket connections, plus the overhead
      from MUNGE's cryptographic operations, from the exchange. While performance
      is not criticial for slurmd -> slurmstepd communication, this also improves
      performance for other system utilities such as pam_slurm_adopt (which needs
      to connect to half of the extern stepds on the node on average), or a future
      nss_slurm module which is expected to place an even higher load on this API.
      
      The one caveat here is that the API was not built in a way that makes this
      restructing easy. The slurmstepd protocol version, which may be one or two
      release behind that of the slurmd, was only sent back to the slurmd _after_
      the auth cred has been received and validated. So, to handle backwards
      compatibility, we change over to sending the SLURM_PROTOCOL_VERSION instead
      of SOCKET_CONNECT as the first int over the socket. If the slurmstepd
      returns an error - since this value is not equal to SOCKET_CONNECT (zero)
      as was required in older versions - we allow that connection to close, and
      try to reconnect using the older RPC format instead. That fallback code
      should be removed two versions after 19.05 is released.
      06863788
  3. 02 Dec, 2018 2 commits
    • Tim Wickberg's avatar
      Rework debug3 messages in _handle_request. · 78ea3e01
      Tim Wickberg authored
      Use __func__, and list the function name first in the message.
      
      Drop one redundant message printing the request number - all paths
      through the switch statement will print this out in some form.
      
      Remove a ternary used to print SLURM_SUCCESS/SLURM_FAILURE and
      print the raw return value. If you're staring at debug3 logs,
      you should hopefully know how to interpret these values. :)
      78ea3e01
    • Tim Wickberg's avatar
      Modify _handle_request to drop gid as an argument. · af96f57b
      Tim Wickberg authored
      Not used, so don't bother retrieving it from the cred in _handle_accept.
      
      Also, switch a printf format to %u instead of %d for uid_t.
      af96f57b
  4. 29 Nov, 2018 5 commits
  5. 28 Nov, 2018 14 commits
  6. 27 Nov, 2018 6 commits