- 08 Jan, 2013 1 commit
-
-
Danny Auble authored
instead of a large array. This appears to speed up the process a big deal before we were seeing times of over 6000 usecs just to memset the array for a 5D system. With this patch on average the whole process takes around 1000 usecs with many being way under that.
-
- 07 Jan, 2013 1 commit
-
-
Danny Auble authored
-
- 04 Jan, 2013 5 commits
-
-
jette authored
Make sure out of memory gets logged properly for slurmctld in foreground Fix slurmd and slurmdbd to log out of memory to stdout in foreground
-
jette authored
-
Mark A. Grondona authored
The MPIRUN_PROCESSES variable set by the mpi/mvapich plugin probably is not needed for most if not all recent versions of mvapich. This environment variable also negatively affects job scalability since its length is proportional to the number of tasks in a job. In fact, for very large jobs, the increased environment size can lead to failures in execve(2). Since MPIRUN_PROCESSES *might* be required in some older versions of mvapich, this patch disables the setting of that variable completely only if SLURM_NEED_MVAPICH_MPIRUN_PROCESSES is not set in the job's environment. (Thus, by default MPIRUN_PROCESSES is disabled, but the old behavior may be restored by setting the environment variable above)
-
https://github.com/SchedMD/slurmjette authored
-
jette authored
-
- 03 Jan, 2013 16 commits
-
-
Morris Jette authored
Conflicts: META NEWS
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Nathan Yee authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Command line argument would not be processed, but scontrol would exit immediately
-
Morris Jette authored
Conflicts: src/scontrol/scontrol.c
-
https://github.com/SchedMD/slurmjette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
jette authored
-
- 02 Jan, 2013 1 commit
-
-
Morris Jette authored
The original patch works fine to avoid cancelling a job when all of it's nodes go unresponsive, but I don't see any way to easily address nodes coming back into service. We want to cancel jobs that have some up nodes and some down nodes, but the nodes will come back into service indivually rather than all at once.
-
- 31 Dec, 2012 1 commit
-
-
jette authored
The job will be aborted if any node is set DOWN while responding or when "scontrol reconfig" is executed or the slurmctld restarts, but it should respond better to global failures, like the network going down.
-
- 29 Dec, 2012 3 commits
-
-
jette authored
-
Danny Auble authored
Fix broken build when HAVE_READLINE is false
-
Ralph Castain authored
-
- 28 Dec, 2012 8 commits
-
-
Morris Jette authored
There are far fewer RPCs not suppored than are supported, so this should be faster and easier to maintain.
-
Morris Jette authored
Conflicts: src/common/slurm_protocol_util.c
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
jette authored
-
jette authored
-
jette authored
-
- 27 Dec, 2012 4 commits