- 18 Apr, 2019 4 commits
-
-
Dominik Bartkiewicz authored
Properly initialize structures throughout Slurm. Bug 6613
-
Danny Auble authored
Bug 6613
-
Dominik Bartkiewicz authored
Bug 6613
-
Tim Wickberg authored
Regression from aca37654 . Bug 6826. Co-authored-by: Chad Vizino <chad@schedmd.com>
-
- 16 Apr, 2019 8 commits
-
-
Danny Auble authored
These are conditions that need to remain constant until something changes on the job to reevaluate. Bug 6625
-
Danny Auble authored
What was happening here is you had to not be >= operator to have the old limits removed. This makes it so it always happens. Bug 6625
-
Brian Christiansen authored
Bug 6625
-
Danny Auble authored
Before we went up the tree to the next assoc_ptr. As we validate an association on the id as well as the uid the assoc_ptr was eventually going to become invalid. Setting it to NULL here solves a bunch of issues with things later on. Bug 6625
-
Danny Auble authored
Bug 6625
-
Nathan Rini authored
Bug 6625.
-
Danny Auble authored
Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. Bug 6625
-
Brian Christiansen authored
Bug 6625
-
- 13 Apr, 2019 3 commits
-
-
Marshall Garey authored
The backfill scheduler keeps a local list of job pointers. Since the backfill scheduler yields locks, it's possible for pending jobs to be canceled and purged in these yield periods. The backfill scheduler then has pointers to now invalid memory, and dereferencing those pointers is undefined behavior and may result in a segfault. This commit prevents purging jobs while the backfill scheduler is running. Bug 6621
-
Danny Auble authored
Bug 6739
-
Paolo Margara authored
Bug 6785.
-
- 12 Apr, 2019 1 commit
-
-
Tim Wickberg authored
-
- 10 Apr, 2019 6 commits
-
-
Albert Gil authored
Bug 6608.
-
Dominik Bartkiewicz authored
Bug 6807.
-
Alejandro Sanchez authored
==8640== Thread 5 bckfl: ==8640== Syscall param openat(filename) points to unaddressable byte(s) ==8640== at 0x4A81D0E: open (open64.c:48) ==8640== by 0x5934ABB: _update_job_env (burst_buffer_cray.c:3338) ==8640== by 0x5934ABB: bb_p_job_begin (burst_buffer_cray.c:3962) ... ==8640== Address 0x6b96120 is 16 bytes inside a block of size 61 free'd ==8640== at 0x48369AB: free (vg_replace_malloc.c:530) ==8640== by 0x49D4873: slurm_xfree (xmalloc.c:244) ==8640== by 0x490C317: free_command_argv (run_command.c:249) ==8640== by 0x5934A5C: bb_p_job_begin (burst_buffer_cray.c:3947) ... ==8640== Block was alloc'd at ==8640== at 0x4837B65: calloc (vg_replace_malloc.c:752) ==8640== by 0x49D4566: slurm_xmalloc (xmalloc.c:87) ==8640== by 0x49D4B67: makespace (xstring.c:103) ==8640== by 0x49D4C91: _xstrcat (xstring.c:134) ==8640== by 0x49D4ECF: _xstrfmtcat (xstring.c:280) ==8640== by 0x593497C: bb_p_job_begin (burst_buffer_cray.c:3936) ... Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
- 05 Apr, 2019 1 commit
-
-
Alejandro Sanchez authored
Bug 6791.
-
- 03 Apr, 2019 2 commits
-
-
Alejandro Sanchez authored
This prevents rebuilding a job's dependency string when it has at least one invalid (never satisfied) dependency, no matter if such invalid dependency has already been purged (after MinJobAge) or not. This can be useful to track down the culprit invalid dependencies even after they are gone from ctld's in-memory job list. The flag is cleared upon a successful job dependency update or after another job in the dependency list has been satisfied if such list is composed with the '?' symbol (OR'ed). Bug 5851.
-
Alejandro Sanchez authored
Job dependencies separated by "?" (OR'ed) should make the dependant job be independent as soon as any of the dependencies are resolved to be satisfied. Without this patch, if an invalid (non satisfiable) dependency was resolved before a satisfiable one, then the dependant job would never become independent, even after the satisfiable one got eventually resolved. Bug 5851.
-
- 27 Mar, 2019 1 commit
-
-
Dominik Bartkiewicz authored
Bug 6750.
-
- 26 Mar, 2019 1 commit
-
-
Alejandro Sanchez authored
Bug 6710.
-
- 21 Mar, 2019 1 commit
-
-
Marshall Garey authored
Change to %pJ identifier while here and merge different partition priorities into a single log message line. Bug 6663.
-
- 20 Mar, 2019 5 commits
-
-
Alejandro Sanchez authored
Bug 6723
-
Albert Gil authored
Regression from enhancement 4506. Bug 6680
-
Danny Auble authored
Bug 6662
-
Alejandro Sanchez authored
Previously the state reason could remain as WAIT_NO_REASON even after backfill evaluation. This should improve system feedback/responsiveness user perception. Bug 6594.
-
Brian Christiansen authored
Continuation of 69d78159 Bug 6500
-
- 19 Mar, 2019 2 commits
-
-
Danny Auble authored
Issue was from ade9101e. The problem was an over simplified if statement which when ran multiple times would set the start time incorrectly. Bug 6697
-
Alejandro Sanchez authored
Even if main scheduler doesn't allocate resources for hetjobs, the queue list should be composed by all types of jobs. Otherwise, lower priority regular jobs could be allocated resources by main scheduler while higher priority hetjobs are waiting for a backfill cycle. Bug 6593.
-
- 15 Mar, 2019 1 commit
-
-
Matt Ezell authored
Bug 6679
-
- 13 Mar, 2019 1 commit
-
-
Alejandro Sanchez authored
Instruct the backfill scheduler to attempt to start a heterogeneous job as soon as all of its components are determined able to do so. Bug 5579.
-
- 12 Mar, 2019 1 commit
-
-
Dominik Bartkiewicz authored
There are at least two points where this can return a false positive error message which can be confusing for users. Continuation dc583bd1 Bug 6437
-
- 11 Mar, 2019 1 commit
-
-
Danny Auble authored
-
- 07 Mar, 2019 1 commit
-
-
Tim Wickberg authored
-