- 11 Jul, 2012 1 commit
-
-
Danny Auble authored
for a job to finish on it the number of unused cpus wasn't updated correctly.
-
- 09 Jul, 2012 1 commit
-
-
Martin Perry authored
See Bugzilla #73 for more complete description of the problem. Patch by Martin Perry, Bull.
-
- 06 Jul, 2012 1 commit
-
-
Carles Fenoy authored
If job is submitted to more than one partition, it's partition pointer can be set to an invalid value. This can result in the count of CPUs allocated on a node being bad, resulting in over- or under-allocation of its CPUs. Patch by Carles Fenoy, BSC. Hi all, After a tough day I've finally found the problem and a solution for 2.4.1 I was able to reproduce the explained behavior by submitting jobs to 2 partitions. This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job. I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c) This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here. I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition. job_ptr = job_queue_rec->job_ptr; part_ptr = job_queue_rec->part_ptr; job_ptr->part_ptr = part_ptr; xfree(job_queue_rec); if (!IS_JOB_PENDING(job_ptr)) continue; /* started in other partition */ Hope this is enough information to solve it. I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed. Regards, Carles Fenoy
-
- 04 Jul, 2012 1 commit
-
-
Morris Jette authored
-
- 03 Jul, 2012 4 commits
-
-
Danny Auble authored
there are jobs running on that hardware.
-
Morris Jette authored
-
Lipari, Don authored
-
Tim Wickberg authored
-
- 02 Jul, 2012 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Carles Fenoy authored
correctly when transitioning. This also applies for 2.4.0 -> 2.4.1, no state will be lost. (Thanks to Carles Fenoy)
-
Morris Jette authored
-
Morris Jette authored
-
- 29 Jun, 2012 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
- 28 Jun, 2012 2 commits
-
-
Danny Auble authored
-
Janne Blomqvist authored
janne.blomqvist@aalto.fi
-
- 27 Jun, 2012 3 commits
-
-
Mark Nelson authored
-
Morris Jette authored
-
Morris Jette authored
-
- 26 Jun, 2012 10 commits
-
-
Danny Auble authored
(via code from Martin Pool <mbp sourcefrog net>) so we can get a correct alphanumeric sort of hostnames.
-
Danny Auble authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
bg.properties in order for the runjob_mux to run correctly. Signed-off-by: Danny Auble <da@schedmd.com>
-
Danny Auble authored
c++
-
Danny Auble authored
but job is going to be canceled because it is interactive or other reason it now receives the grace time.
-
Morris Jette authored
-
- 25 Jun, 2012 6 commits
-
-
Danny Auble authored
check if a block is still makable if the cable wasn't in error.
-
Danny Auble authored
-
Danny Auble authored
removal of the job on the block failed.
-
Danny Auble authored
-
Danny Auble authored
-
Rod Schultz authored
-
- 22 Jun, 2012 4 commits
-
-
Danny Auble authored
29d79ef8
-
Danny Auble authored
-
Danny Auble authored
same time a block is destroyed and that block just happens to be the smallest overlapping block over the bad hardware.
-
Danny Auble authored
-