1. 06 May, 2015 8 commits
  2. 05 May, 2015 12 commits
  3. 04 May, 2015 4 commits
    • Ryan Cox's avatar
      Infrastructure to add SSH-launched procs in Slurm · 3153612e
      Ryan Cox authored
      here is information about this patch and the reasons for it: http://tech.ryancox.net/2015/04/caller-id-handling-ssh-launched-processes-in-slurm.html)
      
      As discussed previously, here is a patch against master (branched this morning at 709f6504).  It works though I'm sure it has some rough edges that you'll find.  I had to export a few symbols that weren't there from stepd_api.[ch].  A lot of the code I had to modify is new territory for me so it's likely I made many mistakes.
      
      There are a few minor things I might end up wanting to change (I'm not exactly in love with some of the variable or function names I chose though I can live with them).  I might make a few minor tweaks to the pam module as well but it won't affect the RPC code.  Currently the README is written like a manpage.  I might turn it into a man page and say "read the manpage" in the README.
      
      Here is an excerpt from the README that states how decisions are made:
        1) Check the local stepds for a count of jobs owned by the non-root user
          a) If none, deny (option action_no_jobs)
          b) If only one, adopt the process into that job
          c) If multiple, continue
        2) Determine src/dst IP/port of socket
        3) Issue callerid RPC to slurmd at IP address of source
          a) If the remote slurmd can identify the source job, adopt into that job
          b) If not, continue
        4) Pick a random local job from the user to adopt into (option action_unknown)
      
      I tried to document to thoroughly document the code, so hopefully it makes sense.  Also, I noticed that one of the stepd functions returns a uid_t which is set to -1 on error.  The problem with that is that Linux's uid_t is uint32_t.
      
      One area of concern in the code is the stepd calls in pam_slurm_adopt.c code.  I hope I'm doing enough error handling there, but maybe not.  What happens if a step is completing or if the step data is still around even though it's actually dead?
      
      The code to actually adopt processes is currently a no-op.  That will depend on having the allocation step code added.  I haven't checked yet to see if all the relevant plugins (proctrack, jobacct_gather, etc.) have hooks to add a new process to the plugin.  If not, it will have to be added as well.
      
      Lastly, I exceed 80 characters on lines with user-visible strings since Slurm follows the Linux kernel coding style.  Chapter 2 of https://www.kernel.org/doc/Documentation/CodingStyle says "never break user-visible strings... because that breaks the ability to grep for them" (which I have wished Slurm followed, by the way, since I have hit that issue).  I know in the past that you wanted even those lines to be wrapped but I figured I would ask if anything has changed :)
      3153612e
    • Morris Jette's avatar
      445f6c5b
    • Morris Jette's avatar
      5b6e0d2f
    • Alejandro Sanchez's avatar
      59f76825
  4. 02 May, 2015 1 commit
  5. 01 May, 2015 13 commits
  6. 30 Apr, 2015 2 commits