- 22 Oct, 2013 7 commits
-
-
Morris Jette authored
I did the merge improperly
-
-
Morris Jette authored
If a node has GRES and multiple threads per core the select/cons_res plugin can get stuck in an infinite loop. See bug 475 Contributed by: PREVOST Ludovic NEC HPC Europe
-
Morris Jette authored
-
Morris Jette authored
-
Thomas Cadeau authored
If slurmd fails to get IPMI value, then I propose to force to wait 1 second instead of asking BMC again. (Part 3/4 of the patch). If IPMI init fails when slurmd forces to update the value, then we should not update the value. (Part 4/4 of the patch Part 1/4 and 2/4 add a security in IPMI init because the function can be call several time. This force to return SLURM_FAILURE if the first call failed, since the other call will not do anything. bug 469
-
Morris Jette authored
Previously a node failure would always requeue the job
-
- 21 Oct, 2013 1 commit
-
-
Morris Jette authored
Restore default behavior of allocating cores to jobs on a cyclic basis across the sockets unless SelectTypeParameters=CR_CORE_DEFAULT_DIST_BLOCK or user specifies other distribution options. Reverts commit 7fcdc7e5 bug 466
-
- 20 Oct, 2013 3 commits
-
-
jette authored
Change Sockets to SocketsPerBoard and Procs to CPUs
-
jette authored
If the backfill scheduler relinquishes locks and the normal job scheduler starts a job that the backfill scheduler was actively working, the backfill scheduler will try to re-schedule that same job, possibly resulting in an invalid memory reference or other badness.
-
jette authored
-
- 19 Oct, 2013 3 commits
-
-
Morris Jette authored
Fix for --cpu_bind=map_cpu/mask_cpu/map_ldom/mask_ldom plus --mem_bind=map_mem/mask_mem options, broken in 2.6.2. See commit 718382da
-
Morris Jette authored
Expect was failing periodicallly due to apparent timing problems
-
David Bigagli authored
-
- 18 Oct, 2013 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This messsage type: Warning: Note very large processing time from schedule: usec=9467365 began=11:06:23.003 is reporting the end time as the began value
-
Danny Auble authored
-
- 17 Oct, 2013 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
This prevents premature re-sending of job kill RPC (e.g. "Resending TERMINATE_JOB request JobId=#")
-
Danny Auble authored
-
David Bigagli authored
-
- 16 Oct, 2013 3 commits
-
-
Chrysovalantis Paschoulas authored
-
jette authored
-
Morris Jette authored
If the default partition has shared=force, then each job is allocated whole nodes and core reservations tests are not valid
-
- 15 Oct, 2013 5 commits
-
-
Morris Jette authored
-
Trofinoff, Stephen authored
-
Martin Perry authored
-
Danny Auble authored
-
Filip Skalski authored
This fixes another error in job priority calculations
-
- 14 Oct, 2013 5 commits
-
-
Filip Skalski authored
-
Filip Skalski authored
-
Nathan Yee authored
-
jette authored
-
jette authored
The pending jobs will have their reservation info removed bug 455
-
- 11 Oct, 2013 5 commits
-
-
Morris Jette authored
Increase maximum number of hostlist ranges from 12k to 64k and use malloc to allocate memory rather than using the stack bug 458
-
Morris Jette authored
Initiate jobs pending to run in a reservation as soon as the reservation becomes active. Partial fix for bug 455
-
Morris Jette authored
Revert commit 626be3ea It was causing stack overflow and memory corruption
-
Martin Perry authored
-
Morris Jette authored
Previous logic only reported un-reserved node map. New logging adds information about each job testing and where/when it is scheduled resources.
-