1. 09 Mar, 2012 1 commit
  2. 08 Mar, 2012 10 commits
  3. 07 Mar, 2012 4 commits
  4. 06 Mar, 2012 9 commits
  5. 02 Mar, 2012 7 commits
    • Morris Jette's avatar
      Mods in priority/multifactor for prio=1 · b223af49
      Morris Jette authored
      In SLURM verstion 2.4, we now schedule jobs at priority=1 and no longer treat
      it as a special case.
      b223af49
    • Morris Jette's avatar
      Cosmetic mods to priority logic · 0810353e
      Morris Jette authored
      0810353e
    • Morris Jette's avatar
      Merge branch 'slurm-2.3' · ec372e00
      Morris Jette authored
      ec372e00
    • Morris Jette's avatar
      cray/srun wrapper, don't use aprun -q by default · ea9adc17
      Morris Jette authored
      In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
      option is used.
      ea9adc17
    • Morris Jette's avatar
      Change a slurmd msg from info() to debug() · 73f915bf
      Morris Jette authored
      73f915bf
    • Morris Jette's avatar
      Merge branch 'slurm-2.3' · c06064bc
      Morris Jette authored
      c06064bc
    • Morris Jette's avatar
      Fix for possible SEGV · ed56303c
      Morris Jette authored
      Here's what seems to have happened:
      
      - A job was pending, waiting for resources.
      - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done.
      - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details.
      - scontrol reconfigure was done again.
      - read_slurm_conf called _restore_job_dependencies.
      - _restore_job_dependencies called build_feature_list for each job in the job list
      - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault.
      
      The problem was reported by a customer on Slurm 2.2.7.  I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies.  If the job state is JOB_FAILED, the job is skipped.
      
      Regards,
      Martin
      
      This is an alternative solutionh to bug316980fix.patch
      ed56303c
  6. 01 Mar, 2012 1 commit
  7. 29 Feb, 2012 1 commit
  8. 28 Feb, 2012 7 commits