1. 04 May, 2015 4 commits
    • Ryan Cox's avatar
      Infrastructure to add SSH-launched procs in Slurm · 3153612e
      Ryan Cox authored
      here is information about this patch and the reasons for it: http://tech.ryancox.net/2015/04/caller-id-handling-ssh-launched-processes-in-slurm.html)
      
      As discussed previously, here is a patch against master (branched this morning at 709f6504).  It works though I'm sure it has some rough edges that you'll find.  I had to export a few symbols that weren't there from stepd_api.[ch].  A lot of the code I had to modify is new territory for me so it's likely I made many mistakes.
      
      There are a few minor things I might end up wanting to change (I'm not exactly in love with some of the variable or function names I chose though I can live with them).  I might make a few minor tweaks to the pam module as well but it won't affect the RPC code.  Currently the README is written like a manpage.  I might turn it into a man page and say "read the manpage" in the README.
      
      Here is an excerpt from the README that states how decisions are made:
        1) Check the local stepds for a count of jobs owned by the non-root user
          a) If none, deny (option action_no_jobs)
          b) If only one, adopt the process into that job
          c) If multiple, continue
        2) Determine src/dst IP/port of socket
        3) Issue callerid RPC to slurmd at IP address of source
          a) If the remote slurmd can identify the source job, adopt into that job
          b) If not, continue
        4) Pick a random local job from the user to adopt into (option action_unknown)
      
      I tried to document to thoroughly document the code, so hopefully it makes sense.  Also, I noticed that one of the stepd functions returns a uid_t which is set to -1 on error.  The problem with that is that Linux's uid_t is uint32_t.
      
      One area of concern in the code is the stepd calls in pam_slurm_adopt.c code.  I hope I'm doing enough error handling there, but maybe not.  What happens if a step is completing or if the step data is still around even though it's actually dead?
      
      The code to actually adopt processes is currently a no-op.  That will depend on having the allocation step code added.  I haven't checked yet to see if all the relevant plugins (proctrack, jobacct_gather, etc.) have hooks to add a new process to the plugin.  If not, it will have to be added as well.
      
      Lastly, I exceed 80 characters on lines with user-visible strings since Slurm follows the Linux kernel coding style.  Chapter 2 of https://www.kernel.org/doc/Documentation/CodingStyle says "never break user-visible strings... because that breaks the ability to grep for them" (which I have wished Slurm followed, by the way, since I have hit that issue).  I know in the past that you wanted even those lines to be wrapped but I figured I would ask if anything has changed :)
      3153612e
    • Morris Jette's avatar
      445f6c5b
    • Morris Jette's avatar
      5b6e0d2f
    • Alejandro Sanchez's avatar
      59f76825
  2. 02 May, 2015 1 commit
  3. 01 May, 2015 12 commits
  4. 30 Apr, 2015 17 commits
  5. 29 Apr, 2015 6 commits
    • Morris Jette's avatar
      scancel job array ID parsing fixes · 5e815d03
      Morris Jette authored
      Previous logic would not recognize a job ID specification with a
        job array task ID of "*" (e.g. "123_*") to indicated all job
        array tasks.
      Previous logic would stop any parsing after the closing bracket
        on a job array specification (e.g. "123_[4-6] 234" would not
        see the "234").
      Improve logging of job ID specifications (i.e. use job array
        specification).
      5e815d03
    • Morris Jette's avatar
      Merge branch 'elastisearch' · ea366396
      Morris Jette authored
      Conflicts:
      	NEWS
      ea366396
    • Morris Jette's avatar
      Forgot to add the elasticsearch Makefile.in · 2a4ebb0e
      Morris Jette authored
      2a4ebb0e
    • Morris Jette's avatar
      Merge branch 'slurm-14.11' · ba4607e4
      Morris Jette authored
      ba4607e4
    • Morris Jette's avatar
      Parse "#_*" as all tasks in job array · 4c9f70b0
      Morris Jette authored
      Modify slurmctld's parsing of a job_id string for the job_signal and
      job_requeue calls to treat a job ID value of "#_*" as representing
      all tasks in a job ID number "#". Previously treated as invalid input.
      
      Also set the last_job_update time so that if a pending job is killed,
      then that is reported immediately by "squeue -i#" (previously it
      may keep reporting stale date.
      4c9f70b0
    • Morris Jette's avatar
      Minor update to mailing list page · 82586d86
      Morris Jette authored
      Trying to avoid having technical questions sent to "sales@schedmd.com"
      82586d86