- 06 Aug, 2014 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Apply BatchStartTimeout configuration to task launch and avoid aborting srun commands due to long running Prolog scripts. bug 978
-
Morris Jette authored
When nodes scheduled for reboot, set state to DOWN rather than FUTURE so they are still visible to sinfo. State set to IDLE after reboot completes. bug 1007
-
- 05 Aug, 2014 3 commits
-
-
Morris Jette authored
Srun executable names beginning with "." will be resolved based upon the working directory and path on the compute node rather than the submit node.
-
David Bigagli authored
-
Morris Jette authored
Always call select_g_step_finish() when terminating a job step, even if the job is also being terminated. This is needed for Cray systems. bug 1012
-
- 04 Aug, 2014 1 commit
-
-
Morris Jette authored
Fix race condition in CPU frequency set with job preemption. When the preemptor job completed, it would notify the srun, which would notify the slurmctld, which could resume a preempted job. That preempted job could reset the CPU frequency before the preemptor. This change has the slurmstepd resetting a job's CPU frequency prior to notifying srun of completion, which eliminates the race condition. bug 1011
-
- 01 Aug, 2014 3 commits
-
-
David Bigagli authored
"job_comp/mysql" setting an incorrect default database.
-
David Bigagli authored
-
David Bigagli authored
database index for the array elements avoiding duplicate database values.
-
- 31 Jul, 2014 2 commits
-
-
Franco Broi authored
-
Morris Jette authored
Scontrol modified to print separate error messages for job arrays with different exit codes on the different tasks of the job array. Applies to job suspend and resume operations.
-
- 30 Jul, 2014 2 commits
-
-
Morris Jette authored
This will set/export only specific environment variables
-
David Bigagli authored
job elapsed time.
-
- 29 Jul, 2014 1 commit
-
-
David Bigagli authored
the i/o thread.
-
- 28 Jul, 2014 3 commits
-
-
David Bigagli authored
-
David Bigagli authored
exit code.
-
Morris Jette authored
Test 3.11 was failing in some configurations without this as the CPU count in the RPC was lower than the number of nodes in the required node list
-
- 25 Jul, 2014 2 commits
-
-
Danny Auble authored
similar wording.
-
Danny Auble authored
-
- 24 Jul, 2014 1 commit
-
-
Danny Auble authored
information wasn't stored in accounting.
-
- 23 Jul, 2014 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
job/step completion.
-
Danny Auble authored
bit_unfmt. Signed-off-by: Danny Auble <da@schedmd.com>
-
Morris Jette authored
In the job_submit plugin: Remove all slurmctld locks prior to job_submit() being called for improved performance. If any slurmctld data structures are read or modified, add locks directly in the plugin.
-
- 22 Jul, 2014 5 commits
-
-
David Linden authored
-
David Bigagli authored
HealthCheckProgram for nodes in any other states than IDLE.#978
-
Morris Jette authored
Unload job tables rather than windows at job end. The table unload also unloads job tables.
-
Morris Jette authored
Added new internal Slurm functions xmalloc_nz() and xrealloc_nz(), which do not initialize the allocated memory to zero for improved performance.
-
Morris Jette authored
switch/nrt - Do not explicitly unload windows for a job on termination, only unload its table (which automatically unloads its windows).
-
- 19 Jul, 2014 1 commit
-
-
Morris Jette authored
-
- 18 Jul, 2014 8 commits
-
-
Morris Jette authored
-
David Bigagli authored
lost should the slurmctld restart.
-
David Bigagli authored
-
Morris Jette authored
Correct NumCPUs count for jobs with --exclusive option. bug 909
-
David Bigagli authored
lost should the slurmctld restart.
-
David Bigagli authored
-
Morris Jette authored
Correct NumCPUs count for jobs with --exclusive option. bug 909
-
Morris Jette authored
This probably only happens on native Cray systems due to the deallocation delays related to node health check. In any case, the symptom is error message of this sort "job # dealloc of node ... bad node_offset 0 count is 0". It then fails to deallocate the nodes GRES back for use by other jobs. bug 973
-