1. 16 Mar, 2012 7 commits
  2. 14 Mar, 2012 2 commits
    • Morris Jette's avatar
      Set Cray srun default job name · 0b24e690
      Morris Jette authored
      Cray - For srun wrapper when creating a job allocation, set the default job
      name to the executable file's name. Ignore leading directory names in the path.
      0b24e690
    • Morris Jette's avatar
      Change read lock to write lock · d53b7c26
      Morris Jette authored
      This patch contains the bits of bad_dbtime.diff from CSCS which have
      not already been committed
      d53b7c26
  3. 13 Mar, 2012 5 commits
  4. 12 Mar, 2012 1 commit
  5. 02 Mar, 2012 2 commits
    • Morris Jette's avatar
      cray/srun wrapper, don't use aprun -q by default · ea9adc17
      Morris Jette authored
      In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
      option is used.
      ea9adc17
    • Morris Jette's avatar
      Fix for possible SEGV · ed56303c
      Morris Jette authored
      Here's what seems to have happened:
      
      - A job was pending, waiting for resources.
      - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done.
      - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details.
      - scontrol reconfigure was done again.
      - read_slurm_conf called _restore_job_dependencies.
      - _restore_job_dependencies called build_feature_list for each job in the job list
      - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault.
      
      The problem was reported by a customer on Slurm 2.2.7.  I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies.  If the job state is JOB_FAILED, the job is skipped.
      
      Regards,
      Martin
      
      This is an alternative solutionh to bug316980fix.patch
      ed56303c
  6. 29 Feb, 2012 1 commit
  7. 28 Feb, 2012 5 commits
  8. 27 Feb, 2012 1 commit
    • Morris Jette's avatar
      Reduce gres error logging · 670be35a
      Morris Jette authored
      Only report "gres/<name> lacks File parameter" if some nodes define
      File AND this node does not AND (new part here) the GRES count on
      this node is non-zero
      670be35a
  9. 24 Feb, 2012 9 commits
  10. 23 Feb, 2012 1 commit
  11. 22 Feb, 2012 4 commits
  12. 21 Feb, 2012 1 commit
  13. 20 Feb, 2012 1 commit
    • jette's avatar
      slurm init script : bug correction in stop target error code mgmt · 15793b0a
      jette authored
      In current version of slurm initscript, a stop action returns a
      non null exit code as slurmstatus exit code is directly used and
      the daemons are stopped.
      Ensure that when called from slurmstop, slurmstatus error code is
      reversed to correctly match the attended error code of the stop
      stage.
      Port of v2.4 commit a09bffa5
      Matthieu Hautreux authored 3 months ag
      15793b0a