- 13 Aug, 2012 1 commit
-
-
Danny Auble authored
-
- 10 Aug, 2012 3 commits
-
-
Danny Auble authored
and a job comes through backfill and can fit on the block without ending jobs don't set an end_time for the running jobs since they don't need to end to start the job.
-
Danny Auble authored
-
Morris Jette authored
Return ESLURM_NODES_BUSY rather than ESLURM_NODE_NOT_AVAIL error on job submit when required nodes are up, but completing a job or in exclusive job allocation.
-
- 09 Aug, 2012 2 commits
-
-
Matthieu Hautreux authored
previous 20 minute time limit. The previous behavior would fail for large files 20 minutes into the transfer.
-
Morris Jette authored
Close the batch job's environment file when it contains no data to avoid leaking file descriptors.
-
- 08 Aug, 2012 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 07 Aug, 2012 1 commit
-
-
Brian Gilmer authored
-
- 06 Aug, 2012 1 commit
-
-
Danny Auble authored
to operate the same way salloc or sbatch did and assign a task per cpu by default instead of task per node.
-
- 03 Aug, 2012 1 commit
-
-
Danny Auble authored
correctly. Before the errno wasn't being checked correctly
-
- 01 Aug, 2012 4 commits
-
-
Danny Auble authored
-
Morris Jette authored
-
Danny Auble authored
correctly as of IBM driver V1R1M1 efix 008.
-
Danny Auble authored
configured correctly.
-
- 31 Jul, 2012 7 commits
-
-
Danny Auble authored
current or in the past.
-
Mark Nelson authored
from Mark Nelson
-
Janne Blomqvist authored
Using the syscalls directly rather than calling bin/(u)mount via system() avoids a few fork + exec calls, and provides better error handling if something goes wrong. Users of this functionality are also updated to use slurm_strerror in order to provide a more informative error message. The mount and umount syscalls are Linux-specific, but so are cgroups so no portability is lost.
-
Danny Auble authored
-
Danny Auble authored
Using the syscalls directly rather than calling bin/(u)mount via system() avoids a few fork + exec calls, and provides better error handling if something goes wrong. Users of this functionality are also updated to use slurm_strerror in order to provide a more informative error message. The mount and umount syscalls are Linux-specific, but so are cgroups so no portability is lost.
-
Danny Auble authored
Using the syscalls directly rather than calling bin/(u)mount via system() avoids a few fork + exec calls, and provides better error handling if something goes wrong. Users of this functionality are also updated to use slurm_strerror in order to provide a more informative error message. The mount and umount syscalls are Linux-specific, but so are cgroups so no portability is lost.
-
Danny Auble authored
the current plugin has been loaded when using runjob_mux_refresh_config
-
- 26 Jul, 2012 1 commit
-
-
Morris Jette authored
Correct parsing of srun/sbatch input/output/error file names so that only the name "none" is mapped to /dev/null and not any file name starting with "none" (e.g. "none.o"). This fixes bug #98.
-
- 24 Jul, 2012 1 commit
-
-
Morris Jette authored
Gres: If a gres has a count of one and an associated file then when doing a reconfiguration, the node's bitmap was not cleared resulting in an underflow upon job termination or removal from scheduling matrix by the backfill scheduler.
-
- 23 Jul, 2012 1 commit
-
-
Morris Jette authored
Cray and BlueGene - Do not treat lack of usable front-end nodes when slurmctld deamon starts as a fatal error. Also preserve correct front-end node for jobs when there is more than one front-end node and the slurmctld daemon restarts.
-
- 19 Jul, 2012 2 commits
-
-
Danny Auble authored
while it is attempting to free underlying hardware is marked in error making small blocks overlapping with the freeing block. This only applies to dynamic layout mode.
-
Alejandro Lucero Palau authored
-
- 13 Jul, 2012 2 commits
-
-
Danny Auble authored
is always set when sending or receiving a message.
-
Tim Wickberg authored
-
- 12 Jul, 2012 4 commits
-
-
Danny Auble authored
than 1 midplane but not the entire allocation.
-
Danny Auble authored
multi midplane block allocation.
-
Danny Auble authored
-
Danny Auble authored
where other blocks on an overlapping midplane are running jobs.
-
- 11 Jul, 2012 3 commits
-
-
Danny Auble authored
hardware is marked bad remove the larger block and create a block over just the bad hardware making the other hardware available to run on.
-
Danny Auble authored
allocation.
-
Danny Auble authored
for a job to finish on it the number of unused cpus wasn't updated correctly.
-
- 09 Jul, 2012 1 commit
-
-
Martin Perry authored
See Bugzilla #73 for more complete description of the problem. Patch by Martin Perry, Bull.
-
- 06 Jul, 2012 1 commit
-
-
Carles Fenoy authored
If job is submitted to more than one partition, it's partition pointer can be set to an invalid value. This can result in the count of CPUs allocated on a node being bad, resulting in over- or under-allocation of its CPUs. Patch by Carles Fenoy, BSC. Hi all, After a tough day I've finally found the problem and a solution for 2.4.1 I was able to reproduce the explained behavior by submitting jobs to 2 partitions. This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job. I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c) This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here. I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition. job_ptr = job_queue_rec->job_ptr; part_ptr = job_queue_rec->part_ptr; job_ptr->part_ptr = part_ptr; xfree(job_queue_rec); if (!IS_JOB_PENDING(job_ptr)) continue; /* started in other partition */ Hope this is enough information to solve it. I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed. Regards, Carles Fenoy
-
- 03 Jul, 2012 1 commit
-
-
Danny Auble authored
there are jobs running on that hardware.
-
- 02 Jul, 2012 1 commit
-
-
Carles Fenoy authored
correctly when transitioning. This also applies for 2.4.0 -> 2.4.1, no state will be lost. (Thanks to Carles Fenoy)
-