1. 18 Mar, 2012 1 commit
    • Mark A. Grondona's avatar
      xcgroup: add xcgroup_move_process helper function · aa912e4a
      Mark A. Grondona authored
      This patch adds a helper function to common/xcgroup.c to aid
      in moving processes between cgroups. If the cgroups.procs file
      is writable then writing the PID to that file is used, as this
      method moves all threads in a process atomically.
      
      If cgroups.procs is not writable, then each thread must be moved
      individually by walking the /proc/PID/task/ directory and writing
      each taskid individually to the 'tasks' file in the cgroup. The
      second method is racy if a process is concurrently creating
      threads, but it is better than the current method of just moving
      one of the process's threads.
      aa912e4a
  2. 16 Mar, 2012 9 commits
  3. 14 Mar, 2012 2 commits
    • Morris Jette's avatar
      Set Cray srun default job name · 0b24e690
      Morris Jette authored
      Cray - For srun wrapper when creating a job allocation, set the default job
      name to the executable file's name. Ignore leading directory names in the path.
      0b24e690
    • Morris Jette's avatar
      Change read lock to write lock · d53b7c26
      Morris Jette authored
      This patch contains the bits of bad_dbtime.diff from CSCS which have
      not already been committed
      d53b7c26
  4. 13 Mar, 2012 5 commits
  5. 12 Mar, 2012 1 commit
  6. 02 Mar, 2012 2 commits
    • Morris Jette's avatar
      cray/srun wrapper, don't use aprun -q by default · ea9adc17
      Morris Jette authored
      In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
      option is used.
      ea9adc17
    • Morris Jette's avatar
      Fix for possible SEGV · ed56303c
      Morris Jette authored
      Here's what seems to have happened:
      
      - A job was pending, waiting for resources.
      - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done.
      - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details.
      - scontrol reconfigure was done again.
      - read_slurm_conf called _restore_job_dependencies.
      - _restore_job_dependencies called build_feature_list for each job in the job list
      - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault.
      
      The problem was reported by a customer on Slurm 2.2.7.  I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies.  If the job state is JOB_FAILED, the job is skipped.
      
      Regards,
      Martin
      
      This is an alternative solutionh to bug316980fix.patch
      ed56303c
  7. 29 Feb, 2012 1 commit
  8. 28 Feb, 2012 5 commits
  9. 27 Feb, 2012 1 commit
    • Morris Jette's avatar
      Reduce gres error logging · 670be35a
      Morris Jette authored
      Only report "gres/<name> lacks File parameter" if some nodes define
      File AND this node does not AND (new part here) the GRES count on
      this node is non-zero
      670be35a
  10. 24 Feb, 2012 9 commits
  11. 23 Feb, 2012 1 commit
  12. 22 Feb, 2012 3 commits