1. 31 Jul, 2012 5 commits
    • Janne Blomqvist's avatar
      Use mount and umount syscalls when handling cgroup namespaces. · 485c80bc
      Janne Blomqvist authored
      Using the syscalls directly rather than calling bin/(u)mount via
      system() avoids a few fork + exec calls, and provides better error
      handling if something goes wrong.
      
      Users of this functionality are also updated to use slurm_strerror in
      order to provide a more informative error message.
      
      The mount and umount syscalls are Linux-specific, but so are cgroups
      so no portability is lost.
      485c80bc
    • Danny Auble's avatar
      remove last patch to give author credit · 557c52d1
      Danny Auble authored
      557c52d1
    • Danny Auble's avatar
      Use mount and umount syscalls when handling cgroup namespaces. · c3889ec4
      Danny Auble authored
      Using the syscalls directly rather than calling bin/(u)mount via
      system() avoids a few fork + exec calls, and provides better error
      handling if something goes wrong.
      
      Users of this functionality are also updated to use slurm_strerror in
      order to provide a more informative error message.
      
      The mount and umount syscalls are Linux-specific, but so are cgroups
      so no portability is lost.
      c3889ec4
    • Danny Auble's avatar
      Use mount and umount syscalls when handling cgroup namespaces. · b4c1d3d7
      Danny Auble authored
      Using the syscalls directly rather than calling bin/(u)mount via
      system() avoids a few fork + exec calls, and provides better error
      handling if something goes wrong.
      
      Users of this functionality are also updated to use slurm_strerror in
      order to provide a more informative error message.
      
      The mount and umount syscalls are Linux-specific, but so are cgroups
      so no portability is lost.
      b4c1d3d7
    • Danny Auble's avatar
      BGQ - added version string to the load of the runjob_mux plugin to verify · 610cfe65
      Danny Auble authored
          the current plugin has been loaded when using runjob_mux_refresh_config
      610cfe65
  2. 26 Jul, 2012 1 commit
  3. 24 Jul, 2012 1 commit
  4. 23 Jul, 2012 1 commit
  5. 19 Jul, 2012 2 commits
  6. 13 Jul, 2012 2 commits
  7. 12 Jul, 2012 4 commits
  8. 11 Jul, 2012 3 commits
  9. 09 Jul, 2012 1 commit
  10. 06 Jul, 2012 1 commit
    • Carles Fenoy's avatar
      Fix for incorrect partition point for job · dd1d573f
      Carles Fenoy authored
      If job is submitted to more than one partition, it's partition pointer can
      be set to an invalid value. This can result in the count of CPUs allocated
      on a node being bad, resulting in over- or under-allocation of its CPUs.
      Patch by Carles Fenoy, BSC.
      
      Hi all,
      
      After a tough day I've finally found the problem and a solution for 2.4.1
      I was able to reproduce the explained behavior by submitting jobs to 2 partitions.
      This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job.
      
      I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c)
      
      This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here.
      I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition.
      
      job_ptr = job_queue_rec->job_ptr;
      
      part_ptr = job_queue_rec->part_ptr;
      job_ptr->part_ptr = part_ptr;
      xfree(job_queue_rec);
      
      if (!IS_JOB_PENDING(job_ptr))
      
      continue; /* started in other partition */
      
      Hope this is enough information to solve it.
      
      I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed.
      
      Regards,
      Carles Fenoy
      dd1d573f
  11. 03 Jul, 2012 1 commit
  12. 02 Jul, 2012 1 commit
  13. 28 Jun, 2012 1 commit
  14. 26 Jun, 2012 4 commits
  15. 25 Jun, 2012 3 commits
  16. 22 Jun, 2012 3 commits
  17. 20 Jun, 2012 2 commits
  18. 18 Jun, 2012 2 commits
  19. 13 Jun, 2012 2 commits