- 20 Dec, 2013 2 commits
-
-
Danny Auble authored
for better debug
-
Danny Auble authored
midplane block that starts on a higher coordinate than it ends (i.e if a block has midplanes [0010,0013] 0013 is the start even though it is listed second in the hostlist).
-
- 19 Dec, 2013 1 commit
-
-
Morris Jette authored
It has been changed to improve the calculated value for pending jobs and use the actual node count value for jobs that have been started (including suspended, completed, etc.) bug 549
-
- 18 Dec, 2013 7 commits
-
-
Danny Auble authored
that spans multiple midplanes the cnodes are correctly accounted for that are in error.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
being in error.
-
Danny Auble authored
-
Morris Jette authored
Note that each job's node allocation is counted separately. bug 548
-
- 17 Dec, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
will return ENOTCONN and not initialize the addr_str causing valgrind errors.
-
- 16 Dec, 2013 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Hughes, Doug authored
This allows multiple job ids to hold, uhold, resume, suspend, release, etc.
-
Morris Jette authored
-
- 14 Dec, 2013 3 commits
-
-
Danny Auble authored
226b49a3
-
Morris Jette authored
Test would periodically fail due to expect timing. This seems to fix the problem
-
Danny Auble authored
-
- 13 Dec, 2013 4 commits
-
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
Fix slurmstepd race condition when separate threads are reading and modifying the job's environment, which can result in the slurmstepd failing with an invalid memory reference. Observed at shutdown when trying to run the task epilog and trying to read the env var: SLURM_STEP_KILLED_MSG_NODE_ID
-
Morris Jette authored
We do not want to look at the core file, so avoid generating it and then having to manually clear it later.
-
- 12 Dec, 2013 5 commits
-
-
Morris Jette authored
Without this change, sstat would try to unpack accounting data that was never packed, resulting in message unpack errors.
-
Morris Jette authored
There were some parsing issues and the test was not as general as it should have been
-
Danny Auble authored
-
Danny Auble authored
throw away initialized variable.
-
Morris Jette authored
Without this patch, free() is called on a random memory location (i.e. whatever is on the stack), which can result in slurmstepd dying and a completed job not being purged in a timely fashion.
-
- 11 Dec, 2013 3 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
Fix race condition in authentication credential creation that could corrupt memory. (NOTE: This race condition has existed since 2003 and would be exceedingly rare.)
-
- 10 Dec, 2013 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 09 Dec, 2013 2 commits
-
-
Morris Jette authored
This is needed for job arrays with discontiguous task ID values (e.g. "123_[1,3,5,...99999]")
-
Morris Jette authored
Previously job arrays were only listed with their native job ID (e.g. 123_0 listed as 123, 123_1 as 124, etc). Now lists the job ID using both format (e.g. "123_1 (124)"). The same format is used for job step IDs (e.g. "123_1.2 (124.2)").
-
- 08 Dec, 2013 2 commits
-
-
jette authored
-
jette authored
If the GRES is associated with specific files AND the GRES count is reset using scontrol AND the slurmd is restarted either without a gres.conf file or with a count and no specific files AND the GRES count is then increased using scontrol the GRES bitmap will not match its count This fixes the root cause of the mismatch between bitmap size and GRES count and should render the rebuilding of the bitmap unnecessary. The rebuilding was handled in the following commits commit ec4df3bf commit 1712d619
-
- 07 Dec, 2013 2 commits
-
-
Danny Auble authored
-
Philip D. Eckert authored
-