1. 25 Jan, 2014 1 commit
    • Morris Jette's avatar
      Split job "shared" field · f21d21d6
      Morris Jette authored
      Split a slurmctld's job record "shared" field into "share_res"
      (share resource) and "whole_node" fields. Needed to better manage
      allocation of whole nodes for core specialization without disabling
      gang scheduling of such jobs.
      f21d21d6
  2. 23 Jan, 2014 5 commits
  3. 22 Jan, 2014 1 commit
  4. 21 Jan, 2014 2 commits
  5. 20 Jan, 2014 1 commit
  6. 18 Jan, 2014 1 commit
  7. 17 Jan, 2014 1 commit
  8. 16 Jan, 2014 5 commits
  9. 15 Jan, 2014 2 commits
  10. 13 Jan, 2014 2 commits
  11. 11 Jan, 2014 1 commit
  12. 10 Jan, 2014 1 commit
  13. 09 Jan, 2014 2 commits
  14. 08 Jan, 2014 4 commits
  15. 07 Jan, 2014 3 commits
  16. 06 Jan, 2014 2 commits
    • Morris Jette's avatar
      Reset job priority on manual resume · 65d9196c
      Morris Jette authored
      If a job is explicitly suspended, its priority is set to zero.
      This resets the priority when requeued and also documents that
      if the job is requeued (e.g. due to a node failure), then it
      is placed in a held state.
      65d9196c
    • Morris Jette's avatar
      Correct job RunTime if requeued from suspend state · bc3d8828
      Morris Jette authored
      Without this patch, the job's RunTime includes its RunTime from
      before it's prior suspend (i.e. the job's full RunTime rather than
      just the RunTime of the requeued job).
      bc3d8828
  17. 27 Dec, 2013 1 commit
    • Filip Skalski's avatar
      Fix sched/backfill bug that could starve jobs · 2bae8bd6
      Filip Skalski authored
      Hello,
      
      I think I found another bug in the code (I'm using 2.6.3 but I checked the 2.6.5 and 14.03 versions and it's the same there).
      
      In file sched/backfill/backfill.c:
      
      1)
      _add_reservation function, from lines 1172:
      
      if (placed == true) {
              j = node_space[j].next;
              if (j && (end_reserve < node_space[j].end_time)) {
                      /* insert end entry record */
                      i = *node_space_recs;
                      node_space[i].begin_time = end_reserve;
                      node_space[i].end_time = node_space[j].end_time;
                      node_space[j].end_time = end_reserve;
                      node_space[i].avail_bitmap =
                              bit_copy(node_space[j].avail_bitmap);
                      node_space[i].next = node_space[j].next;
                      node_space[j].next = i;
                      (*node_space_recs)++;
              }
              break;
      }
      I draw a picture with `node_space` state after 2 iterations (see attachment).
      
      In case where the new reservation is fully inside another reservation,
      then everything is OK.
      But if the new reservation spans multiple existing reservations then the `end entry record` is not created.
      This is because only the newly created `start entry record` is checked.
      
      Easy fix would be to change the if into a loop, for example:
      
      if (placed == true) {
          while((j = node_space[j].next) > 0) {
              if (end_reserve < node_space[j].end_time) {
                 //same as above
                 break;
              }
          }
          break;
      }
      
      2)
      You could also change line 612:
              node_space = xmalloc(sizeof(node_space_map_t) *
                                   (max_backfill_job_cnt + 3));
      To `(max_backfill_job_cnt * 2 + 1)` , since each reservation can add at most two entries (check at line 982 should never execute). At the moment, in a worst case scenario this only checks half of the max_backfill_job_cnt.
      
      NOTE: However this is all based on the assumption, that it is not done on purpose to speed up the calculations and trading some of the accuracy (especially point 2).
      
      Best regards,
      Filip Skalski
      2bae8bd6
  18. 23 Dec, 2013 4 commits
  19. 20 Dec, 2013 1 commit