- 17 Apr, 2019 5 commits
-
-
Brian Christiansen authored
to run RebootProgram from ctld. Bug 6115
-
Nate Rini authored
Bug 6010
-
Brian Christiansen authored
Bug 6869
-
Broderick Gardner authored
to guarantee a minimum runtime before preemption. Bug 5519
-
Brian Christiansen authored
to send user signal at preemption time. Bug 5867
-
- 16 Apr, 2019 8 commits
-
-
Danny Auble authored
These are conditions that need to remain constant until something changes on the job to reevaluate. Bug 6625
-
Danny Auble authored
What was happening here is you had to not be >= operator to have the old limits removed. This makes it so it always happens. Bug 6625
-
Brian Christiansen authored
Bug 6625
-
Danny Auble authored
Before we went up the tree to the next assoc_ptr. As we validate an association on the id as well as the uid the assoc_ptr was eventually going to become invalid. Setting it to NULL here solves a bunch of issues with things later on. Bug 6625
-
Danny Auble authored
Bug 6625
-
Nathan Rini authored
Bug 6625.
-
Danny Auble authored
Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. Bug 6625
-
Brian Christiansen authored
Bug 6625
-
- 14 Apr, 2019 1 commit
-
-
Tim Wickberg authored
-
- 13 Apr, 2019 3 commits
-
-
Marshall Garey authored
The backfill scheduler keeps a local list of job pointers. Since the backfill scheduler yields locks, it's possible for pending jobs to be canceled and purged in these yield periods. The backfill scheduler then has pointers to now invalid memory, and dereferencing those pointers is undefined behavior and may result in a segfault. This commit prevents purging jobs while the backfill scheduler is running. Bug 6621
-
Danny Auble authored
Bug 6739
-
Paolo Margara authored
Bug 6785.
-
- 12 Apr, 2019 2 commits
-
-
Alejandro Sanchez authored
Refactor how memory allocations are managed to accurately track memory allocations on each node when the --mem-per-cpu option is used and the CPU count per node varies. Also accounts for Memory Specialization and wraps much of the logging with a DebugFlag of SelectType bug 5562
-
Tim Wickberg authored
-
- 11 Apr, 2019 1 commit
-
-
Doug Jacobsen authored
Bug 6787.
-
- 10 Apr, 2019 7 commits
-
-
Albert Gil authored
Bug 6608.
-
Dominik Bartkiewicz authored
Bug 6807.
-
Alejandro Sanchez authored
==8640== Thread 5 bckfl: ==8640== Syscall param openat(filename) points to unaddressable byte(s) ==8640== at 0x4A81D0E: open (open64.c:48) ==8640== by 0x5934ABB: _update_job_env (burst_buffer_cray.c:3338) ==8640== by 0x5934ABB: bb_p_job_begin (burst_buffer_cray.c:3962) ... ==8640== Address 0x6b96120 is 16 bytes inside a block of size 61 free'd ==8640== at 0x48369AB: free (vg_replace_malloc.c:530) ==8640== by 0x49D4873: slurm_xfree (xmalloc.c:244) ==8640== by 0x490C317: free_command_argv (run_command.c:249) ==8640== by 0x5934A5C: bb_p_job_begin (burst_buffer_cray.c:3947) ... ==8640== Block was alloc'd at ==8640== at 0x4837B65: calloc (vg_replace_malloc.c:752) ==8640== by 0x49D4566: slurm_xmalloc (xmalloc.c:87) ==8640== by 0x49D4B67: makespace (xstring.c:103) ==8640== by 0x49D4C91: _xstrcat (xstring.c:134) ==8640== by 0x49D4ECF: _xstrfmtcat (xstring.c:280) ==8640== by 0x593497C: bb_p_job_begin (burst_buffer_cray.c:3936) ... Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Ben Roberts authored
Changed the behavior of "scontrol reboot" to require the user to specify the nodes to reboot rather than defaulting to ALL. Bug 6465
-
- 09 Apr, 2019 4 commits
-
-
Brian Christiansen authored
This allows jobs to be placed on booting nodes rather than being given a whole node even if it would have been better to wait for the node boot. Bug 6782
-
Brian Christiansen authored
-
Brian Christiansen authored
to make nodes available after being suspended even if down, drain, failed. Bug 6212
-
Brian Christiansen authored
Bug 6333
-
- 05 Apr, 2019 2 commits
-
-
Alejandro Sanchez authored
Bug 6501.
-
Alejandro Sanchez authored
Bug 6791.
-
- 03 Apr, 2019 4 commits
-
-
Alejandro Sanchez authored
This prevents rebuilding a job's dependency string when it has at least one invalid (never satisfied) dependency, no matter if such invalid dependency has already been purged (after MinJobAge) or not. This can be useful to track down the culprit invalid dependencies even after they are gone from ctld's in-memory job list. The flag is cleared upon a successful job dependency update or after another job in the dependency list has been satisfied if such list is composed with the '?' symbol (OR'ed). Bug 5851.
-
Alejandro Sanchez authored
Job dependencies separated by "?" (OR'ed) should make the dependant job be independent as soon as any of the dependencies are resolved to be satisfied. Without this patch, if an invalid (non satisfiable) dependency was resolved before a satisfiable one, then the dependant job would never become independent, even after the satisfiable one got eventually resolved. Bug 5851.
-
Felip Moll authored
The response of the XCC raw command is always 16 bytes, we log it and return if we don't get an answer of this size. Bug 6743
-
Morris Jette authored
If GRES configuration data is unavailable from gres.conf, then use the node's "Gres=" information slurm.conf. This will eliminate or minimize the gres.conf file in many situations. bug 6761
-
- 02 Apr, 2019 2 commits
-
-
Felip Moll authored
In 0e149092 not setting the variable when job was not requesting any gres was considered a bug. The cuda API will use all devices if the variable is not set. If it is set to some unknown or empty value, it will use no devices. This variable should be used only for testing purposes and ConstrainDevices=yes in cgroup is recommended. Bug 6412
-
Felip Moll authored
gres plugins will setup environment for every gres in the system even if the job has not requested it. Bug 6412
-
- 28 Mar, 2019 1 commit
-
-
Broderick Gardner authored
Removed linear search, replaced with direct record references and a hashmap. This is faster and avoids potential collisions between assoc id's and user id's. Bug 4811
-