1. 06 Feb, 2012 2 commits
    • Danny Auble's avatar
      The openpty(3) call used by slurmstepd to allocate a pseudo-terminal · 2a1c08b0
      Danny Auble authored
      is a convenience function in BSD and glibc that internally calls
      the equivalent of
      
          int masterfd = open("/dev/ptmx", flags);
          grantpt (masterfd);
          unlockpt (masterfd);
          int slavefd = open (slave, O_RDRW|O_NOCTTY);
      
      (in psuedocode)
      
      On Linux, with some combinations of glibc/kernel (in this
      case glibc-2.14/Linux-3.1), the equivalent of grantpt(3) was failing
      in slurmstepd with EPERM, because the allocated pty was getting
      root ownership instead of the user running the slurm job.
      
      From the POSIX description of grantpt:
      
       "The grantpt() function shall change the mode and ownership of the
        slave pseudo-terminal device... The user ID of the slave shall
        be set to the real UID of the calling process..."
      
       http://pubs.opengroup.org/onlinepubs/007904875/functions/grantpt.html
      
      This means that for POSIX-compliance, the real user id of slurmstepd
      must be the user executing the SLURM job at the time openpty(3) is
      called. Unfortunately, the real user id of slurmstepd at this
      point is still root, and only the effective uid is set to the user.
      
      This patch is a work-around that uses the (non-portable) setresuid(2)
      system call to reset the real and effective uids of the slurmstepd
      process to the job user, but keep the saved uid of root. Then after
      the openpty(3) call, the previous credentials are reestablished
      using the same call.
      2a1c08b0
    • Danny Auble's avatar
      1b1e6196
  2. 03 Feb, 2012 1 commit
    • Morris Jette's avatar
      Fix for srun with --exclude and --nodes · a4551158
      Morris Jette authored
      Fix for srun allocating running within existing allocation with --exclude
      option and --nnodes count small enough to remove more nodes.
      
          > salloc -N 8
          salloc: Granted job allocation 1000008
          > srun -N 2 -n 2 --exclude=tux3 hostname
          srun: error: Unable to create job step: Requested node configuration is not available
      
      Patch from Phil Eckert, LLNL.
      a4551158
  3. 02 Feb, 2012 1 commit
  4. 01 Feb, 2012 5 commits
  5. 31 Jan, 2012 6 commits
    • Danny Auble's avatar
      BLUEGENE - fix for not allowing jobs if all midplanes are drained and all · 1e40f647
      Danny Auble authored
      blocks are in an error state.
      1e40f647
    • Danny Auble's avatar
      Added MaxCPURunMins to sacctmgr --help · 0741a338
      Danny Auble authored
      0741a338
    • Danny Auble's avatar
      whitespace cleanup · a372fc29
      Danny Auble authored
      a372fc29
    • Morris Jette's avatar
      Note nature of latest change · 7189ecaa
      Morris Jette authored
      7189ecaa
    • Didier GAZEN's avatar
      Problem when using srun --uid in conjunction with --jobid (patch included) · e2b39c14
      Didier GAZEN authored
      Hi,
      
      With slurm 2.3.2 (or 2.3.3), I encounter the following error when
      trying to launch as root a command attached to a running user's job
      even if I use the --uid=<user> option :
      
      sila@suse112:~> squeue
         JOBID PARTITION     NAME     USER    STATE      TIME TIMELIMIT
      NODES   CPUS NODELIST(REASON)
           551     debug mysleep.     sila  RUNNING      0:02 UNLIMITED
      1      1 n1
      
      root@suse112:~ # srun --jobid=551 hostname
      srun: error: Unable to create job step: Access/permission denied
      <--normal behaviour
      
      root@suse112:~ # srun --jobid=551 --uid=sila hostname
      srun: error: Unable to create job step: Invalid user id <--problem
      
      By increasing slurmctld verbosity, the log files displays the follwing
      error :
      
      slurmctld: debug2: Processing RPC: REQUEST_JOB_ALLOCATION_INFO_LITE from
      uid=0
      slurmctld: debug:  _slurm_rpc_job_alloc_info_lite JobId=551 NodeList=n1
      usec=1442
      slurmctld: debug2: Processing RPC: REQUEST_JOB_STEP_CREATE from uid=0
      slurmctld: error: Security violation, JOB_STEP_CREATE RPC from uid=0 to
      run as uid 1001
      
      which occurs in function : _slurm_rpc_job_step_create
      (src/slurmctld/proc_req.c)
      
      Here's my patch to prevent the command from failing (but I'm not sure
      that there is no side effects) :
      e2b39c14
    • Danny Auble's avatar
      Fix to the multifactor priority plugin to calculate effective usage earlier · 7d9e3ed2
      Danny Auble authored
      to give a correct priority on the first decay cycle after a restart of the
      slurmctld. Patch from Martin Perry, Bull.
      7d9e3ed2
  6. 27 Jan, 2012 7 commits
  7. 25 Jan, 2012 1 commit
    • Morris Jette's avatar
      Set DEFAULT flag in partition structure · 9f4ef925
      Morris Jette authored
      Set DEFAULT flag in partition structure when slurmctld reads the
      configuration file. Patch from Rémi Palancher. Note the flag is set
      when the information is sent via RPC for sinfo.
      9f4ef925
  8. 24 Jan, 2012 3 commits
  9. 23 Jan, 2012 3 commits
    • Morris Jette's avatar
      Add global variable as needed in priority/multifactor · cdcc4af9
      Morris Jette authored
      needed for test24.1
      cdcc4af9
    • Morris Jette's avatar
      e5ac37f1
    • Philip D. Eckert's avatar
      here it is · d254296c
      Philip D. Eckert authored
      Moe,
      
      Here it is, I have added a subroutine to env.c to
      unset the user's environment and then called it
      from sbatch in main. I also removed the comment
      from the sbatch man page indicating that it
      wasn't working the same for a regular user as
      it did for Moab. It should now be functionally
      the same.
      
      I think there is still a difference between how
      sbatch functions with an environment in a file
      than it does from when Moab excve'd the environment.
      However, I'm not sure what it would be at this
      point.
      
      Again,  for so many iterations....
      
      Phil
      d254296c
  10. 22 Jan, 2012 1 commit
    • Philip D. Eckert's avatar
      last one · 16e6fcd6
      Philip D. Eckert authored
      Moe,
      
      After doing more extensive testing, I came to realize
      that we had made a bad basic assumption. We believed
      that the user's environment should only be what was
      sent in the file via the --export-file option.
      
      However, that broke the previous behavior, especially
      in regard to Moab jobs. It also caused the SLURM
      defined environment variables to be lost as well.
      
      This patch will enable the correct behavior for Moab
      on top of SLURM whne using the --export-file option,
      but the behavior is less that pefect for using it
      stand alone with sbatch. When using the option with
      sbatch as a user, the file environment is read in,
      and then when the env_array_merge is made, some
      variables may get overwritten. This is good for
      the SLURM and MPI vairables, but not so good for
      others., The problem is trying to reconcile two
      sources of environment is very problematic.
      
      I also added a caveat in the man page.
      
      I made changes in my branch of SchedMD SLURM
      for 2.3, here is the patch.
      
      Phil
      16e6fcd6
  11. 20 Jan, 2012 2 commits
  12. 19 Jan, 2012 5 commits
  13. 18 Jan, 2012 3 commits