- 14 Jun, 2016 1 commit
-
-
Alejandro Sanchez authored
Initially I wanted to record somehow the time a job has been in PD state until it actually starts. To do so, I derived an eligible_time field which was calculated as start_time - begin_time. This was not correct and a user of the plugin reported that if the job remained a lot of time in PD state the field underflowed. A more accurate approach would be calculating the field as start_time - eligible_time (the actual Slurm eligible_time). Since this field is not accessible within the log_record function which receives a job_record struct as parameter, I've decided to just remove this field from being logged.
-
- 13 Jun, 2016 1 commit
-
-
Morris Jette authored
Prior logic was getting confused with NUMA containing no cores. bug 2745
-
- 10 Jun, 2016 5 commits
-
-
Jason Bacon authored
-
Morris Jette authored
This adds support for burst buffer re-issue of pre-load operation when slurmctld restarts with job in configuring state. Also copy NEWS item to v16.05.
-
Morris Jette authored
Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. bugs 2789 and 2810
-
Danny Auble authored
can work on a BGAS node. Bug 2806
-
Danny Auble authored
of continuously printing the message over and over as the problem will most likely not resolve itself. Bug 2797
-
- 09 Jun, 2016 1 commit
-
-
Danny Auble authored
account or wckey.
-
- 08 Jun, 2016 1 commit
-
-
Danny Auble authored
-
- 07 Jun, 2016 3 commits
-
-
Andy Riebs authored
-
Morris Jette authored
Fix for tracking job resource allocation when slurmctld is reconfigured while Cray Node Health Check (NHC) is running. Previous logic would fail to record the job's allocation then perform release operation upon NHC completeion, resulting in underflow error messages. bug 2353
-
Dominik Bartkiewicz authored
While here, mark options const, and add leading underscore to denote this as a static function (only called within hostlist.c). Also change strcmp to xstrcmp. Commit a6ffef22 changed this function and would alter the input hn, which led to subsequent calls to the function having wrong prefix lengths for that hostrange precluding it from matching correctly. Bug 2558.
-
- 06 Jun, 2016 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
and error files.
-
Morris Jette authored
The buffer to be used for reading the system /proc/*/stat files is moved from the stack to the heap (i.e. malloc'ed memory) and initialized to zero, and increased in size from 1k to 4k. I don't see how this could make any difference, but this anomaly was reported by valgrind. bug 2234
-
- 03 Jun, 2016 3 commits
-
-
Morris Jette authored
The #define of SLURMSTEPD_MEMCHECK in src/slurmd/common/slurmstepd_init.h must be changed to enable memcheck or valgrind. Also change a #if in src/slurmd/slurmd/req.c near where you find the "valgrind" references. Bug 2334 diagnostics
-
Tim Wickberg authored
If the QOS includes a time limit, skip checking the partition limit. The QOS limit is checked separately elsewhere.
-
Tim Wickberg authored
'the the' is is a a mistake mistake.
-
- 02 Jun, 2016 8 commits
-
-
Tim Wickberg authored
Wrong order of operations results in the return code being 0/1.
-
Morris Jette authored
Fix for "scontrol -dd show job" with respect to displaying the specific CPUs allocated to a job on each node. Prior logic would only display the CPU information for the first node in the job allocation. Bug introduced in commit 0f826c0b due to misplaced parenthesis
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Wrong order of operations results in the return code being 0/1.
-
Danny Auble authored
If the plugin ever returns an error the variables weren't initialized so when they were freed they could corrupt memory. Bug 2790
-
Morris Jette authored
Rename "in" to "input" in slurm_step_io_fds data structure defined in slurm.h. This is needed to avoid breaking Python with by using one of its keywords in a Slurm data structure. bug 2755
-
Artem Polyakov authored
Hi, before I never tried to set pmix plugin as the default in the slurm.conf. Setting it in my new test installation highlighted the subject problem. bug 2786
-
- 01 Jun, 2016 2 commits
-
-
Tim Wickberg authored
Bug 2787
-
Tim Wickberg authored
C++ compilation against SPANK would fail otherwise.
-
- 31 May, 2016 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
>= 15.08 for some reports.
-
Morris Jette authored
Needed to test ntasks_per_socket against both NO_VAL and INFINITE for job allocaitons spanning multiple sockets. bug 2766
-
Tim Wickberg authored
Prevents correct error handling by rc being 0/1 instead of the original return code. Also fix slurm_send_only_controller_msg and slurm_send_only_node_msg although these only result in bad printed values in the debug message.
-
Artem Polyakov authored
Bug 2120
-
- 27 May, 2016 6 commits
-
-
Morris Jette authored
This bug was introduced by commit 21c52d2f which fixed a different problem tracking resources associated with suspended jobs. There are subtle differences between jobs that are suspended by a user/administrator and jobs suspended by gang scheduling which resulted in undercounting allocated CPUs when a job suspended by gang scheduling was active at the same time of a slurmctld reconfiguration request. See bugs 2353 (original bug related to commit 21c52d2f and bug 2765
-
Danny Auble authored
accounts) no default account is printed, previously NULL was printed. This is just not printing it, but whole function should probably be revisited as the rigmarole can probably be avoided as we always know what the default is going to be if none is specified (first off the list). The problem with that though is if the user has been added to a cluster already and they have a default, but then added to a new cluster where they don't have a default. In this case you want to keep the first clusters default, but set the default for the second cluster. Bug 2725
-
Danny Auble authored
-
Tim Wickberg authored
Add missing unlock before return. Coverity 44888.
-
Morris Jette authored
This reverts commit cc242de3 That patch fixed bug 2745, but breaks tests 1.89 and 1.91 on typical Xeon processors
-
Morris Jette authored
This bug was introduced by commit 21c52d2f which fixed a different problem tracking resources associated with suspended jobs. There are subtle differences between jobs that are suspended by a user/administrator and jobs suspended by gang scheduling which resulted in undercounting allocated CPUs when a job suspended by gang scheduling was active at the same time of a slurmctld reconfiguration request. See bugs 2353 (original bug related to commit 21c52d2f and bug 2765
-