- 01 Aug, 2012 2 commits
-
-
Danny Auble authored
configured correctly.
-
Danny Auble authored
Move code to handle multiple clusters. Previous code could have issues where job_db_inx space could overlap.
-
- 31 Jul, 2012 8 commits
-
-
Danny Auble authored
current or in the past.
-
Mark Nelson authored
from Mark Nelson
-
Janne Blomqvist authored
Using the syscalls directly rather than calling bin/(u)mount via system() avoids a few fork + exec calls, and provides better error handling if something goes wrong. Users of this functionality are also updated to use slurm_strerror in order to provide a more informative error message. The mount and umount syscalls are Linux-specific, but so are cgroups so no portability is lost.
-
Danny Auble authored
-
Danny Auble authored
Using the syscalls directly rather than calling bin/(u)mount via system() avoids a few fork + exec calls, and provides better error handling if something goes wrong. Users of this functionality are also updated to use slurm_strerror in order to provide a more informative error message. The mount and umount syscalls are Linux-specific, but so are cgroups so no portability is lost.
-
Danny Auble authored
Using the syscalls directly rather than calling bin/(u)mount via system() avoids a few fork + exec calls, and provides better error handling if something goes wrong. Users of this functionality are also updated to use slurm_strerror in order to provide a more informative error message. The mount and umount syscalls are Linux-specific, but so are cgroups so no portability is lost.
-
Danny Auble authored
the current plugin has been loaded when using runjob_mux_refresh_config
-
Don Lipari authored
These comments were orphaned with this commit: 874f797f Move the start time calculation of pending jobs into a separate pthread on Nov 3, 2010.
-
- 26 Jul, 2012 1 commit
-
-
Morris Jette authored
Correct parsing of srun/sbatch input/output/error file names so that only the name "none" is mapped to /dev/null and not any file name starting with "none" (e.g. "none.o"). This fixes bug #98.
-
- 25 Jul, 2012 3 commits
-
-
Morris Jette authored
-
Janne Blomqvist authored
-
Janne Blomqvist authored
-
- 24 Jul, 2012 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
Gres: If a gres has a count of one and an associated file then when doing a reconfiguration, the node's bitmap was not cleared resulting in an underflow upon job termination or removal from scheduling matrix by the backfill scheduler.
-
Danny Auble authored
-
- 23 Jul, 2012 1 commit
-
-
Morris Jette authored
Cray and BlueGene - Do not treat lack of usable front-end nodes when slurmctld deamon starts as a fatal error. Also preserve correct front-end node for jobs when there is more than one front-end node and the slurmctld daemon restarts.
-
- 19 Jul, 2012 8 commits
-
-
Danny Auble authored
-
Danny Auble authored
while it is attempting to free underlying hardware is marked in error making small blocks overlapping with the freeing block. This only applies to dynamic layout mode.
-
Bill Brophy authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Francois Diakhate authored
-
Alejandro Lucero Palau authored
-
- 17 Jul, 2012 3 commits
-
-
Morris Jette authored
This corresponds to commit dd2dce54 from Mark Grondona's work in squeue, but applied to the sview command.
-
Morris Jette authored
Slurm 2.4 minor fixes
-
Morris Jette authored
-
- 16 Jul, 2012 1 commit
-
-
Morris Jette authored
This addresses trouble ticket 85
-
- 13 Jul, 2012 9 commits
-
-
Danny Auble authored
runjob_mux
-
Danny Auble authored
is always set when sending or receiving a message.
-
Tim Wickberg authored
-
Mark A. Grondona authored
Set SLURM_CONF in default prolog/epilog environment instead of only in spank prolog/epilog environment. This change fixes a potential hang during spank prolog/epilog execution due to the possibility of memory allocation after fork(2) and before exec(2) when invoking slurmstepd spank prolog|epilog. This also has the benefit that SLURM commands used in prolog and epilog scripts will use the correct slurm.conf file.
-
Mark A. Grondona authored
If exec_wait_child_wait_for_parent() fails for any reason, it is safer to abort immediately rather than proceed to execute the user's job.
-
Mark A. Grondona authored
On a failure of fork(2), slurmstepd would print an error and exit, possibly leaving previously forked children waiting. Ensure a better cleanup by killing all active children on fork failure before exiting slurmstepd.
-
Mark A. Grondona authored
Close the read end of the pipe slurmstepd uses to notify children it is time to call exec(2) in order to save one file descriptor per task. (Previously, the read side of the pipe wasn't closed until exec_wait_info was destroyed)
-
Mark A. Grondona authored
For some reason squeue was treating completing jobs the same as pending jobs, and reported the number of nodes as the maximum of requested nodelist, requested node count or CPUs (divided into nodes?) This is in contrast to the squeue manpage which explicitly states that the number of nodes reported for completing jobs should be only the nodes that are still allocated to the job. This patch removes the special handling of completing jobs in src/squeue/print.c:_get_node_cnt(), so that the squeue output for completing jobs matches documentation. A comment is also added so that developers looking at the code understand what is going on.
-
Morris Jette authored
-
- 12 Jul, 2012 1 commit
-
-
Danny Auble authored
-