1. 27 Jul, 2012 1 commit
    • Morris Jette's avatar
      Enhancements to sinfo reservation output · 3fd84252
      Morris Jette authored
      I would like to make two changes to this:
      
      1) since the reservation name can easily exceed 9 characters, I would like the field to be however large it needs to be without truncating the name. I did this by looking at the names then setting the field size to that width.
      
      2) The other headers are in capitals, so I changed
      
       ResName    State       StartTime             EndTime           Duration  Nodelist
      
      to
      
      RESV_NAME        STATE           START_TIME            END_TIME     DURATION  NODELIST
      3fd84252
  2. 26 Jul, 2012 2 commits
  3. 24 Jul, 2012 1 commit
  4. 23 Jul, 2012 1 commit
  5. 19 Jul, 2012 2 commits
  6. 16 Jul, 2012 1 commit
  7. 13 Jul, 2012 2 commits
  8. 12 Jul, 2012 4 commits
  9. 11 Jul, 2012 3 commits
  10. 10 Jul, 2012 1 commit
    • Morris Jette's avatar
      Correct job node_cnt value for job completion plugin · 97ce2e19
      Morris Jette authored
      When using the jobcomp/script interface, we have noticed the NODECNT
      environment variable is off-by-one when logging completed jobs in
      the NODE_FAIL state (though the NODELIST is correct).
      
      This appears to be because in many places in job_completion_logger()
      is called after deallocate_nodes(), which appears to decrement
      job->node_cnt for DOWN nodes.
      
      If job_completion_logger() only called the job completion plugin,
      then I would guess that it might be safe to move this call ahead
      of deallocate_nodes(). However, it seems like job_completion_logger()
      also does a bunch of accounting stuff (?), so perhaps that would
      need to be split out first?
      
      Also, there is the possibility that this is working as designed,
      though if so a well placed comment in the code might be appreciated.
      If the decreased nodecount is intended, though, should the DOWN
      nodes also be removed from the job's NODELIST? - Mark Grondona
      97ce2e19
  11. 09 Jul, 2012 1 commit
  12. 06 Jul, 2012 1 commit
    • Carles Fenoy's avatar
      Fix for incorrect partition point for job · dd1d573f
      Carles Fenoy authored
      If job is submitted to more than one partition, it's partition pointer can
      be set to an invalid value. This can result in the count of CPUs allocated
      on a node being bad, resulting in over- or under-allocation of its CPUs.
      Patch by Carles Fenoy, BSC.
      
      Hi all,
      
      After a tough day I've finally found the problem and a solution for 2.4.1
      I was able to reproduce the explained behavior by submitting jobs to 2 partitions.
      This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job.
      
      I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c)
      
      This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here.
      I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition.
      
      job_ptr = job_queue_rec->job_ptr;
      
      part_ptr = job_queue_rec->part_ptr;
      job_ptr->part_ptr = part_ptr;
      xfree(job_queue_rec);
      
      if (!IS_JOB_PENDING(job_ptr))
      
      continue; /* started in other partition */
      
      Hope this is enough information to solve it.
      
      I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed.
      
      Regards,
      Carles Fenoy
      dd1d573f
  13. 03 Jul, 2012 3 commits
  14. 02 Jul, 2012 1 commit
  15. 29 Jun, 2012 2 commits
  16. 28 Jun, 2012 2 commits
  17. 26 Jun, 2012 4 commits
  18. 25 Jun, 2012 3 commits
  19. 22 Jun, 2012 3 commits
  20. 20 Jun, 2012 2 commits