- 27 Aug, 2013 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
- 26 Aug, 2013 1 commit
-
-
Morris Jette authored
Used job terminations due to failure to boot it's allocated nodes or BlueGene block. bug 213
-
- 24 Aug, 2013 4 commits
-
-
Danny Auble authored
Conflicts: src/slurmd/slurmd/slurmd.c
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Perform file name substitutions for scontrol show job stdin/out/err %A - Job array's master job allocation number. %a - Job array ID (index) number. %j - Job ID %u - User name
-
- 23 Aug, 2013 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This is a correction of a bug introduced in commit https://github.com/SchedMD/slurm/commit/ac44db862c8d1f460e55ad09017d058942ff6499 That commit eliminated the need of reading the node state information from squeue for performance reasons (mostly for large parallel systems in which the Prolog ran squeue, which generates a lot of simultaneous RPCs, slowing down the job launch process). It also assumed 1 CPU per node. If a pending job specified a node count of 1 and a task count larger than one, squeue was reporting the node count of the job as the same as the task count. This patch moves that same calculation of a pending job's minimum node count into slurmctld, so the squeue still does not need to read the node information, but can report the correct node count for pending jobs with minimal overhead.
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
- 22 Aug, 2013 19 commits
-
-
Danny Auble authored
-
Danny Auble authored
to avoid it thinking we don't have a cluster name.
-
Danny Auble authored
in slurmctld.h which is included by slurm_accounting_storage.h which is included by slurmdbd.c which would cause confusion at the very least.
-
Nathan Yee authored
-
Danny Auble authored
to avoid it thinking we don't have a cluster name.
-
Nathan Yee authored
-
Nathan Yee authored
%o and %Z respectively
-
Morris Jette authored
-
https://github.com/SchedMD/slurmjette authored
-
jette authored
Previously there was a sleep(5) during which the backup controller was non responsive during its startup mode or returning from primary mode.
-
jette authored
This will prevent possible confusion for the backup controller when it switches from primary back to backup modes since those pthread IDs are no longer value. Note the thred_id_rpc could be used by the backup controller after returning to backup mode
-
Morris Jette authored
-
David Bigagli authored
-
David Bigagli authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
they are coordinators over.
-
David Bigagli authored
-
David Gloe authored
-
- 21 Aug, 2013 8 commits
-
-
Danny Auble authored
-
Morris Jette authored
Replace fixed size buffer with a buffer that can grow as needed.
-
Danny Auble authored
and CLUSTER_FLAG_CRAY_N
-
Danny Auble authored
a Native Cray not making a correct hostlist as well.
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Hongjia Cao authored
If there are completing jobs, a reconfigure will set wrong job/node state: all nodes of the completing job will be set allocated, and the job will not be removed even if the completing nodes are released. The state can only be restored by restarting slurmctld after the completing nodes released.
-