1. 16 Mar, 2012 6 commits
  2. 14 Mar, 2012 2 commits
    • Morris Jette's avatar
      Set Cray srun default job name · 0b24e690
      Morris Jette authored
      Cray - For srun wrapper when creating a job allocation, set the default job
      name to the executable file's name. Ignore leading directory names in the path.
      0b24e690
    • Morris Jette's avatar
      Change read lock to write lock · d53b7c26
      Morris Jette authored
      This patch contains the bits of bad_dbtime.diff from CSCS which have
      not already been committed
      d53b7c26
  3. 13 Mar, 2012 5 commits
  4. 12 Mar, 2012 1 commit
  5. 02 Mar, 2012 2 commits
    • Morris Jette's avatar
      cray/srun wrapper, don't use aprun -q by default · ea9adc17
      Morris Jette authored
      In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
      option is used.
      ea9adc17
    • Morris Jette's avatar
      Fix for possible SEGV · ed56303c
      Morris Jette authored
      Here's what seems to have happened:
      
      - A job was pending, waiting for resources.
      - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done.
      - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details.
      - scontrol reconfigure was done again.
      - read_slurm_conf called _restore_job_dependencies.
      - _restore_job_dependencies called build_feature_list for each job in the job list
      - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault.
      
      The problem was reported by a customer on Slurm 2.2.7.  I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies.  If the job state is JOB_FAILED, the job is skipped.
      
      Regards,
      Martin
      
      This is an alternative solutionh to bug316980fix.patch
      ed56303c
  6. 29 Feb, 2012 1 commit
  7. 28 Feb, 2012 5 commits
  8. 27 Feb, 2012 1 commit
    • Morris Jette's avatar
      Reduce gres error logging · 670be35a
      Morris Jette authored
      Only report "gres/<name> lacks File parameter" if some nodes define
      File AND this node does not AND (new part here) the GRES count on
      this node is non-zero
      670be35a
  9. 24 Feb, 2012 9 commits
  10. 23 Feb, 2012 1 commit
  11. 22 Feb, 2012 4 commits
  12. 21 Feb, 2012 1 commit
  13. 20 Feb, 2012 2 commits