1. 10 Jul, 2012 1 commit
  2. 09 Jul, 2012 1 commit
  3. 07 Jul, 2012 1 commit
  4. 06 Jul, 2012 4 commits
    • Morris Jette's avatar
      Add web page with information about SLURM/POE interface · 5a68d9ab
      Morris Jette authored
      The document still needs work, but is a decent start
      5a68d9ab
    • Morris Jette's avatar
      Move srun loading of plugins earlier in the logic · 1b2ff838
      Morris Jette authored
      This move reduces the risk of srun failing horribly due to code that
      is inconsistent with the plugins if srun is running during a SLURM
      upgrade, especially a major upgrade in which the plugin function
      arguments can change
      1b2ff838
    • Morris Jette's avatar
      Merge branch 'slurm-2.4' · 76a0e82e
      Morris Jette authored
      Conflicts:
      	src/slurmctld/job_scheduler.c
      76a0e82e
    • Carles Fenoy's avatar
      Fix for incorrect partition point for job · dd1d573f
      Carles Fenoy authored
      If job is submitted to more than one partition, it's partition pointer can
      be set to an invalid value. This can result in the count of CPUs allocated
      on a node being bad, resulting in over- or under-allocation of its CPUs.
      Patch by Carles Fenoy, BSC.
      
      Hi all,
      
      After a tough day I've finally found the problem and a solution for 2.4.1
      I was able to reproduce the explained behavior by submitting jobs to 2 partitions.
      This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job.
      
      I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c)
      
      This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here.
      I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition.
      
      job_ptr = job_queue_rec->job_ptr;
      
      part_ptr = job_queue_rec->part_ptr;
      job_ptr->part_ptr = part_ptr;
      xfree(job_queue_rec);
      
      if (!IS_JOB_PENDING(job_ptr))
      
      continue; /* started in other partition */
      
      Hope this is enough information to solve it.
      
      I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed.
      
      Regards,
      Carles Fenoy
      dd1d573f
  5. 05 Jul, 2012 6 commits
  6. 04 Jul, 2012 2 commits
  7. 03 Jul, 2012 13 commits
  8. 02 Jul, 2012 7 commits
  9. 29 Jun, 2012 5 commits