- 22 Nov, 2017 3 commits
-
-
Dominik Bartkiewicz authored
bug 4256
-
Morris Jette authored
-
Dominik Bartkiewicz authored
Bug 4379
-
- 21 Nov, 2017 11 commits
-
-
Dominik Bartkiewicz authored
Can cause slurmstepd to crash, as rlimit_name was pointing to part of the free'd env_name variable. Bug 4409.
-
Artem Polyakov authored
This patch has fixed the problem to me. We are going to do some more verification later today and update. But I would appreciate if somebody else can test it as well. Signed-off-by: Danny Auble <da@schedmd.com>
-
Morris Jette authored
There was a list of pending pack job records under consideration for scheduling by the backfill plugin that was not being cleared between interations of the backfill scheduler resulting in various scheduling anomalies. bug 4371, 4400
-
Morris Jette authored
An abort was triggered here due to a pack job start failure.
-
Danny Auble authored
code dealing with how we need to keep track of it. Bug 4405
-
Morris Jette authored
For heterogeneous job steps, the srun --open-mode option default value will be set to "append".
-
Morris Jette authored
Previous logic would fail if more than 2 pack groups implicitly specified in srun command.
-
Morris Jette authored
Previous logic would delay the initiation of pack jobs until all components were submitted. The new logic will defer pack job scheduling based upon a new "pack_job_offset" field in the job submit request and NOT set a begin_time in the future. This will eliminate the pack job scheduling reason value of "BEGIN_TIME". bugs 4369, 4400
-
Morris Jette authored
-
Morris Jette authored
Fix for bug introduced in commit 9e0b976a bug 4400
-
Patrice Peterson authored
The regex in x11_set_xauth() did not match FQDNs because it was missing a dot. Bug 4398.
-
- 20 Nov, 2017 6 commits
-
-
Morris Jette authored
If a node's physical Boards, Sockets, Cores, Threads, etc. differ from configuration log using error() rather than info(). bug 4394
-
Morris Jette authored
Add SchedulerParameters=whole_pack configuration parameter. If set, then hold, release and cancel operations on any component of a heterogeneous job will be applied to all components. bug 4374
-
Felip Moll authored
Bug 4393.
-
Morris Jette authored
Running "scontrol reconfig" re-initialized some arrays/lists without clearing the previous values, resulting in a memory leak.
-
Morris Jette authored
As reported by Valgrind
-
Morris Jette authored
Previous logic would continuously report command not responding, but not exit the wait loop. Only test15.# checked so far
-
- 17 Nov, 2017 3 commits
-
-
Morris Jette authored
Bug 4366
-
Morris Jette authored
bug 4366
-
Morris Jette authored
-
- 16 Nov, 2017 6 commits
-
-
Morris Jette authored
Coverity CID 179255
-
Morris Jette authored
Fix logic so that we can find all components of a pack job if the submit fails. Populate the "pack_job_list" field on a pack job leader if some later component can't be submitted.
-
Morris Jette authored
Correct printing error type based upon errno rather than returned rc.
-
Dominik Bartkiewicz authored
If PrologSlurmctld fails for pack job leader then kill all components of the job. bug 4379
-
Dominik Bartkiewicz authored
Add SLURM_PACK_JOB_NODELIST to PrologSlurmctld and EpilogSlurmctld environment. bug 4379
-
Morris Jette authored
bug 4370
-
- 15 Nov, 2017 10 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Prevent scheduling deadlock with multiple components of heterogeneous job in different partitions (i.e. one heterogeneous job component is higher priority in one partition and another component is lower priority in a different partition). bug 4370
-
Alejandro Sanchez authored
Issue could be reproduced by restarting slurmctld after a heterogeneous job finished, but before MinJobAge time passed. Since the pack_job_list job_record memeber wasn't saved/loaded to/from the job_state, the function _validate_pack_jobs() is responsible for rebuilding the pack_job_list. Issue was that the function was skiping the rebuild work for finished jobs, thus other functions like the thread responsible for purging old jobs was failing to iterate over a NULL pack_job_list which was never rebuilt. Bug 4383.
-
Felip Moll authored
If run from srun and lua job submit plugin sets environment, slurmctld crashes. Bug#4247
-
Alejandro Sanchez authored
From within slurm_job_submit(): job_desc.pack_job_offset From within slurm_job_modify(): job_rec.pack_job_id job_rec.pack_job_id_set job_rec.pack_job_offset Bug 4372.
-
Felip Moll authored
bug 4339
-
Felip Moll authored
added some additional checks to prevent segfaults in some basic situations. Bug 4247
-
Felip Moll authored
bug 4368
-
Dominik Bartkiewicz authored
Add SLURM_PACK_JOB_ID and SLURM_PACK_JOB_OFFSET to PrologSlurmctld and EpilogSlurmctld environment bug 4379
-
- 14 Nov, 2017 1 commit
-
-
Morris Jette authored
-