1. 06 Jul, 2012 1 commit
    • Carles Fenoy's avatar
      Fix for incorrect partition point for job · dd1d573f
      Carles Fenoy authored
      If job is submitted to more than one partition, it's partition pointer can
      be set to an invalid value. This can result in the count of CPUs allocated
      on a node being bad, resulting in over- or under-allocation of its CPUs.
      Patch by Carles Fenoy, BSC.
      
      Hi all,
      
      After a tough day I've finally found the problem and a solution for 2.4.1
      I was able to reproduce the explained behavior by submitting jobs to 2 partitions.
      This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job.
      
      I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c)
      
      This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here.
      I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition.
      
      job_ptr = job_queue_rec->job_ptr;
      
      part_ptr = job_queue_rec->part_ptr;
      job_ptr->part_ptr = part_ptr;
      xfree(job_queue_rec);
      
      if (!IS_JOB_PENDING(job_ptr))
      
      continue; /* started in other partition */
      
      Hope this is enough information to solve it.
      
      I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed.
      
      Regards,
      Carles Fenoy
      dd1d573f
  2. 03 Jul, 2012 1 commit
  3. 02 Jul, 2012 1 commit
  4. 28 Jun, 2012 1 commit
  5. 26 Jun, 2012 4 commits
  6. 25 Jun, 2012 3 commits
  7. 22 Jun, 2012 3 commits
  8. 20 Jun, 2012 2 commits
  9. 18 Jun, 2012 2 commits
  10. 13 Jun, 2012 2 commits
  11. 12 Jun, 2012 1 commit
  12. 05 Jun, 2012 1 commit
  13. 01 Jun, 2012 2 commits
  14. 31 May, 2012 1 commit
  15. 30 May, 2012 3 commits
  16. 29 May, 2012 1 commit
  17. 25 May, 2012 2 commits
    • Rod Schultz's avatar
      Change SchedulerParamters option from "bf_res=" to "bf_resolution=" · 0f590296
      Rod Schultz authored
      This change makes the code consistent with the documentation.
      Note that "bf_res=" will continue to be recognized for now.
      Patch from Rod Schultz, Bull.
      0f590296
    • Don Albert's avatar
      Modify scontrol show job to require -dd option to print batch script. · 8ed1b303
      Don Albert authored
      I have implemented the changes as you suggested:   using a "-dd" option to indicate that the display of the script is wanted, and setting both the "SHOW_DETAIL" and a new "SHOW_DETAIL2" flag.
      
      Since "scontrol" can be run interactively as well,  I added a new "script" option to indicate that display of both the script and the details is wanted if the job is a batch job.
      
      Here are the man page updates for "man scontrol".   For the "-d, --details" option:
      
             -d, --details
                    Causes  the  show command to provide additional details where available.  Repeating the option more than
                    once (e.g., "-dd") will cause the show job command to also list the batch script, if the job was a batch
                    job.
      
      For the interactive "details" option:
      
             details
                    Causes  the  show  command  to provide additional details where available.  Job information will include
                    CPUs and NUMA memory allocated on each node.  Note that on computers  with  hyperthreading  enabled  and
                    SLURM  configured  to allocate cores, each listed CPU represents one physical core.  Each hyperthread on
                    that core can be allocated a separate task, so a job's CPU count and task count  may  differ.   See  the
                    --cpu_bind  and  --mem_bind  option  descriptions  in  srun man pages for more information.  The details
                    option is currently only supported for the show job command. To also list the  batch  script  for  batch
                    jobs, in addition to the details, use the script option described below instead of this option.
      
      And for the new interactive "script" option:
      
             script Causes the show job command to list the batch script for batch jobs in addition to the  detail  informa-
                    tion described under the details option above.
      
      Attached are the patch file for the changes and a text file with the results of the tests I did to check out the changes.   The patches are against SLURM 2.4.0-rc1.
      
              -Don Albert-
      8ed1b303
  18. 24 May, 2012 3 commits
  19. 23 May, 2012 3 commits
  20. 22 May, 2012 1 commit
  21. 16 May, 2012 2 commits