- 06 Jul, 2012 3 commits
-
-
Morris Jette authored
This move reduces the risk of srun failing horribly due to code that is inconsistent with the plugins if srun is running during a SLURM upgrade, especially a major upgrade in which the plugin function arguments can change
-
Morris Jette authored
Conflicts: src/slurmctld/job_scheduler.c
-
Carles Fenoy authored
If job is submitted to more than one partition, it's partition pointer can be set to an invalid value. This can result in the count of CPUs allocated on a node being bad, resulting in over- or under-allocation of its CPUs. Patch by Carles Fenoy, BSC. Hi all, After a tough day I've finally found the problem and a solution for 2.4.1 I was able to reproduce the explained behavior by submitting jobs to 2 partitions. This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job. I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c) This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here. I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition. job_ptr = job_queue_rec->job_ptr; part_ptr = job_queue_rec->part_ptr; job_ptr->part_ptr = part_ptr; xfree(job_queue_rec); if (!IS_JOB_PENDING(job_ptr)) continue; /* started in other partition */ Hope this is enough information to solve it. I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed. Regards, Carles Fenoy
-
- 05 Jul, 2012 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
This code change is completely different from IBM's example code, but eliminates memory leaks that exist in iBM's sample code.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 04 Jul, 2012 2 commits
-
-
Morris Jette authored
Conflicts: NEWS
-
Morris Jette authored
-
- 03 Jul, 2012 13 commits
-
-
Morris Jette authored
-
Nathan Yee authored
-
Morris Jette authored
-
Danny Auble authored
there are jobs running on that hardware.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Conflicts: META NEWS
-
Morris Jette authored
-
Lipari, Don authored
-
Tim Wickberg authored
-
Alexjandro Lucero Palau authored
Add support for advanced reservation for specific cores rather than whole nodes. Current limiations: homogeneous cluster, nodes idle when reservation created, and no more than one reservation per node. Code is still under development. Work by Alejandro Lucero Palau, et. al, BSC.
-
- 02 Jul, 2012 7 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Carles Fenoy authored
correctly when transitioning. This also applies for 2.4.0 -> 2.4.1, no state will be lost. (Thanks to Carles Fenoy)
-
Morris Jette authored
-
Morris Jette authored
-
- 29 Jun, 2012 6 commits
-
-
Bill Brophy authored
Add reservation flag of Part_Nodes to allocate all nodes in a partition to a reservation and automatically change the reservation when nodes are added to or removed from the reservation. Based upon work by Bill Brophy, Bull.
-
Morris Jette authored
-
Morris Jette authored
Conflicts: META NEWS
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
When running with multiple slurmd daemons per node, enable specifying a range of ports on a single line of the node configuration in slurm.conf. For example: NodeName=tux[0-999] NodeAddr=localhost Port=9000-9999 ...
-
- 28 Jun, 2012 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-