1. 28 May, 2015 1 commit
  2. 27 May, 2015 1 commit
    • Morris Jette's avatar
      Map job --mem-per-cpu=0 to --mem=0. · 33c77302
      Morris Jette authored
      However, --mem=0 now reflects the appropriate amount of memory in the
      system, --mem-per-cpu=0 hasn't changed.  This allows all the memory to
      be allocated in a cgroup but is not "consumed" and is available for
      other jobs running on the same host.
      Eric Martin, Washington University School of Medicine
      33c77302
  3. 26 May, 2015 3 commits
  4. 22 May, 2015 3 commits
  5. 21 May, 2015 2 commits
  6. 20 May, 2015 2 commits
  7. 19 May, 2015 1 commit
  8. 16 May, 2015 1 commit
  9. 15 May, 2015 2 commits
  10. 14 May, 2015 3 commits
  11. 13 May, 2015 4 commits
  12. 12 May, 2015 2 commits
  13. 11 May, 2015 1 commit
    • Morris Jette's avatar
      Purge old step data on job requeue · beecc7b0
      Morris Jette authored
      Make sure that old step data is purged when a job is requeued.
      Without this logic, if a job terminates abnormally then old step
      data may be left in slurmctld. If the job is then requeued and
      started on a different node, referencing that old job step data
      can result in abnormal events. One specific failure mode is if
      the job is requeued on a node with a different number of cores,
      and the step terminated RPC arrives later, the job and step
      bitmaps of allocated cores can differ in size generating an
      abort.
      bug 1660
      beecc7b0
  14. 08 May, 2015 4 commits
  15. 07 May, 2015 1 commit
  16. 06 May, 2015 4 commits
  17. 05 May, 2015 1 commit
  18. 04 May, 2015 1 commit
  19. 01 May, 2015 1 commit
  20. 30 Apr, 2015 2 commits
    • Morris Jette's avatar
      Change slurmctld agent timeout · 98e08216
      Morris Jette authored
      In slurmctld communication agent, make the thread timeout be the configured
      value of MessageTimeout (or 30 seconds, whichever is larger) rather than
      30 seconds.
      98e08216
    • Morris Jette's avatar
      Fix scancel step cancel bug · 5cb067fc
      Morris Jette authored
      Fix scancel bug which could return an error on attempt to signal a job step.
      A simple "scancel 12.3" to signal a specific job step would fail. Adding
      another option (say "-i", "--partion=", etc.) would fix this.
      5cb067fc