- 14 Dec, 2013 1 commit
-
-
David Bigagli authored
channel with the slurmstepd on that node.
-
- 13 Dec, 2013 1 commit
-
-
Morris Jette authored
Fix slurmstepd race condition when separate threads are reading and modifying the job's environment, which can result in the slurmstepd failing with an invalid memory reference. Observed at shutdown when trying to run the task epilog and trying to read the env var: SLURM_STEP_KILLED_MSG_NODE_ID
-
- 12 Dec, 2013 2 commits
-
-
Morris Jette authored
Without this flag, if the configuration changes or is inconsistent between nodes then the pack and unpack can be out of sync in terms of what data is expected. This will let server tell the client what data is packed.
-
Morris Jette authored
Without this patch, free() is called on a random memory location (i.e. whatever is on the stack), which can result in slurmstepd dying and a completed job not being purged in a timely fashion.
-
- 11 Dec, 2013 2 commits
-
-
Danny Auble authored
-
Morris Jette authored
Fix race condition in authentication credential creation that could corrupt memory. (NOTE: This race condition has existed since 2003 and would be exceedingly rare.)
-
- 10 Dec, 2013 1 commit
-
-
Morris Jette authored
This permits a single gres.conf file to be used for a heterogeneous cluster.
-
- 09 Dec, 2013 2 commits
-
-
Morris Jette authored
This is needed for job arrays with discontiguous task ID values (e.g. "123_[1,3,5,...99999]")
-
Morris Jette authored
Previously job arrays were only listed with their native job ID (e.g. 123_0 listed as 123, 123_1 as 124, etc). Now lists the job ID using both format (e.g. "123_1 (124)"). The same format is used for job step IDs (e.g. "123_1.2 (124.2)").
-
- 08 Dec, 2013 1 commit
-
-
jette authored
-
- 07 Dec, 2013 3 commits
-
-
Danny Auble authored
-
Philip D. Eckert authored
-
David Bigagli authored
the slurmctld throws a fatal error.
-
- 06 Dec, 2013 2 commits
-
-
Trofinoff Stephen authored
This adds a mechanism to kill a hung apbasil command
-
Jason Bacon authored
-
- 05 Dec, 2013 3 commits
-
-
Danny Auble authored
news.html.
-
Taras Shapovalov authored
instead of when running on the node for the first time.
-
Morris Jette authored
Add SLURM_CLUSTER_NAME to environment variables passed to PrologSlurmctld, Prolog, EpilogSlurmctld, and Epilog.
-
- 04 Dec, 2013 1 commit
-
-
Morris Jette authored
Previous logic never reopened the file, preventing proper functioning of logrotate.
-
- 03 Dec, 2013 3 commits
-
-
Morris Jette authored
Use hash function to locate job records for improved performance.
-
Morris Jette authored
Change partition write lock to a read lock as we use a different mechanism for hidden partitions in getting individual jobs.
-
Morris Jette authored
Correct logic returning remaining job dependencies in job information reported by scontrol and squeue. Eliminates vestigial descriptors with no job ID values (e.g. "afterany"). As depdencies are removed, the job ID values were removed from the strings, but not the descriptors. This eliminates both. It also checks the full job ID to make sure we do not remove "afterany:1234" when job "123" completes.
-
- 02 Dec, 2013 4 commits
-
-
David Bigagli authored
-
Morris Jette authored
Fix race condition on batch job termination that could result in a job exit code of 0xfffffffe if the slurmd on node zero registers its active jobs at the same time that slurmstepd is recording the job's exit code. but 535
-
David Bigagli authored
-
David Bigagli authored
-
- 29 Nov, 2013 2 commits
-
-
Morris Jette authored
proctrack/cgroup - Add locking to prevent race condition where one job step is ending for a user or job at the same time another job stepsis starting and the user or job container is deleted from under the starting job step. bug 447
-
David Bigagli authored
Substantial performance improvement for systems with Shared=YES or FORCE and large numbers of running jobs (replace bubble sort with quick sort). Bug 525
-
- 27 Nov, 2013 2 commits
-
-
Morris Jette authored
Original code worked only for Cray systems. For other systems it set gres_alloc to the total number of each GRES allocated on each node to any job
-
Morris Jette authored
Original code worked only for Cray systems. For other systems it set gres_alloc to the total number of each GRES allocated on each node to any job
-
- 26 Nov, 2013 3 commits
-
-
Chris Scheller authored
-
Morris Jette authored
-
David Bigagli authored
-
- 25 Nov, 2013 1 commit
-
-
Danny Auble authored
-
- 22 Nov, 2013 1 commit
-
-
David Bigagli authored
-
- 16 Nov, 2013 2 commits
-
-
Phil Eckert authored
-
Chrysovalantis Paschoulas authored
-
- 15 Nov, 2013 3 commits
-
-
Rod Schultz authored
limits are configured as 0.
-
Morris Jette authored
bug 511
-
Morris Jette authored
Add ability to clear a node's DRAIN flag using scontrol or sview by setting it's state to "UNDRAIN". The node's base state (e.g. "DOWN" or "IDLE") will not be changed. bug 514
-