1. 31 Jan, 2014 3 commits
    • David Bigagli's avatar
      31d409b7
    • Danny Auble's avatar
      Make sure node limits get assessed if no node count was given in request. · 5b0f9c39
      Danny Auble authored
      i.e. salloc -n32 doesn't request the number of nodes and with the previous
      code if this request used 4 nodes and only 1 was left in GrpNodes it
      would just run with no issue since we were checking things before we
      selected how many nodes it ran on.
      
      Now we check this afterwards so we always check the limits on how many
      nodes, cpus and how much memory is to be used.
      5b0f9c39
    • Morris Jette's avatar
      Fix step allocation failure due to memory use · 8b76b93c
      Morris Jette authored
      Fix step allocation when some CPUs are not available due to memory limits.
      This happens when one step is active and using memory that blocks the
      scheduling of another step on a portion of the CPUs needed. The new step
      is now delayed rather than aborting with "Requested node configuration is
      not available".
      bug 577
      8b76b93c
  2. 28 Jan, 2014 1 commit
  3. 23 Jan, 2014 2 commits
  4. 21 Jan, 2014 2 commits
  5. 18 Jan, 2014 1 commit
  6. 16 Jan, 2014 2 commits
  7. 15 Jan, 2014 1 commit
  8. 13 Jan, 2014 2 commits
  9. 08 Jan, 2014 3 commits
  10. 07 Jan, 2014 2 commits
  11. 06 Jan, 2014 2 commits
    • Morris Jette's avatar
      Reset job priority on manual resume · 65d9196c
      Morris Jette authored
      If a job is explicitly suspended, its priority is set to zero.
      This resets the priority when requeued and also documents that
      if the job is requeued (e.g. due to a node failure), then it
      is placed in a held state.
      65d9196c
    • Morris Jette's avatar
      Correct job RunTime if requeued from suspend state · bc3d8828
      Morris Jette authored
      Without this patch, the job's RunTime includes its RunTime from
      before it's prior suspend (i.e. the job's full RunTime rather than
      just the RunTime of the requeued job).
      bc3d8828
  12. 27 Dec, 2013 1 commit
    • Filip Skalski's avatar
      Fix sched/backfill bug that could starve jobs · 2bae8bd6
      Filip Skalski authored
      Hello,
      
      I think I found another bug in the code (I'm using 2.6.3 but I checked the 2.6.5 and 14.03 versions and it's the same there).
      
      In file sched/backfill/backfill.c:
      
      1)
      _add_reservation function, from lines 1172:
      
      if (placed == true) {
              j = node_space[j].next;
              if (j && (end_reserve < node_space[j].end_time)) {
                      /* insert end entry record */
                      i = *node_space_recs;
                      node_space[i].begin_time = end_reserve;
                      node_space[i].end_time = node_space[j].end_time;
                      node_space[j].end_time = end_reserve;
                      node_space[i].avail_bitmap =
                              bit_copy(node_space[j].avail_bitmap);
                      node_space[i].next = node_space[j].next;
                      node_space[j].next = i;
                      (*node_space_recs)++;
              }
              break;
      }
      I draw a picture with `node_space` state after 2 iterations (see attachment).
      
      In case where the new reservation i...
      2bae8bd6
  13. 23 Dec, 2013 2 commits
  14. 20 Dec, 2013 2 commits
  15. 19 Dec, 2013 1 commit
    • Morris Jette's avatar
      scontrol show job - Correct NumNodes value · b31e2176
      Morris Jette authored
      It has been changed to improve the calculated value for pending
      jobs and use the actual node count value for jobs that have been
      started (including suspended, completed, etc.)
      bug 549
      b31e2176
  16. 18 Dec, 2013 1 commit
  17. 17 Dec, 2013 2 commits
  18. 16 Dec, 2013 1 commit
  19. 14 Dec, 2013 1 commit
  20. 13 Dec, 2013 2 commits
  21. 12 Dec, 2013 1 commit
    • Morris Jette's avatar
      slurmstepd variable initialization · 06b41cdc
      Morris Jette authored
      Without this patch, free() is called on a random memory location
      (i.e. whatever is on the stack), which can result in slurmstepd
      dying and a completed job not being purged in a timely fashion.
      06b41cdc
  22. 11 Dec, 2013 2 commits
  23. 09 Dec, 2013 2 commits
    • Morris Jette's avatar
      Modify squeue to support longer job ID values · 17f27007
      Morris Jette authored
      This is needed for job arrays with discontiguous task ID values
      (e.g. "123_[1,3,5,...99999]")
      17f27007
    • Morris Jette's avatar
      Improve sview support for job arrays · d998640f
      Morris Jette authored
      Previously job arrays were only listed with their native job ID
      (e.g. 123_0 listed as 123, 123_1 as 124, etc). Now lists the job ID
      using both format (e.g. "123_1 (124)"). The same format is used
      for job step IDs (e.g. "123_1.2 (124.2)").
      d998640f
  24. 08 Dec, 2013 1 commit