- 13 Apr, 2019 12 commits
-
-
Marshall Garey authored
Timestamps were previously being updated before acquiring mutexes. Change it to update timestamps after mutexes have been acquired. Bug 6621
-
Marshall Garey authored
Bug 6621.
-
Marshall Garey authored
purge_old_job() doesn't ever read or write partition info, so remove the requirement for the partition read lock. Bug 6621
-
Danny Auble authored
-
Marshall Garey authored
The backfill scheduler keeps a local list of job pointers. Since the backfill scheduler yields locks, it's possible for pending jobs to be canceled and purged in these yield periods. The backfill scheduler then has pointers to now invalid memory, and dereferencing those pointers is undefined behavior and may result in a segfault. This commit prevents purging jobs while the backfill scheduler is running. Bug 6621
-
Danny Auble authored
The next patch will use slurmctld_diag_stats.bf_active to determine if we can purge jobs or not. Bug 6621
-
Danny Auble authored
# Conflicts: # src/common/gres.c
-
Danny Auble authored
Bug 6739
-
Danny Auble authored
Bug 6803
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Paolo Margara authored
Bug 6785.
-
- 12 Apr, 2019 7 commits
-
-
Marshall Garey authored
Rather than pack cgroup.conf into a buffer every time we create a new slurmstepd, pack it once on init and re-use the buffer each time we create a new slurmstepd. Continuation of 11f70aa5. Bug 5667
-
Alejandro Sanchez authored
Refactor how memory allocations are managed to accurately track memory allocations on each node when the --mem-per-cpu option is used and the CPU count per node varies. Also accounts for Memory Specialization and wraps much of the logging with a DebugFlag of SelectType bug 5562
-
Morris Jette authored
-
Morris Jette authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well.
-
- 11 Apr, 2019 7 commits
-
-
Morris Jette authored
Coverity CID 197586
-
Morris Jette authored
-
Morris Jette authored
This bug was observed once running test1.103 and found to be reproducible under some circumstances. How this works is when a job is submitted with a --time-min value, then the backfill scheduler set's the job's time limit to that value (see below), tries to start it, and if successful then set the time limit to the largest value possible without delaying the expected start time of any higher priority jobs. If the job can't be started, the job's (maximum) time limit is supposed to be restored to its previous value. That was happening in some, but not all places in the code. This patch rests the time limit in the missing cases. Note: The job's time limit is set to the --time-min value here: diff --git a/src/plugins/sched/backfill/backfill.c b/src/plugins/sched/backfill/backfill.c index 5600495e1d..e22a810394 100644 --- a/src/plugins/sched/backfill/backfill.c +++ b/src/plugins/sched/backfill/backfill.c @@ -1930,7 +1930,7 @@ next_task: slurm_get_preempt_mode()) time_limit = job_ptr->time_limit = 1; else if (job_ptr->time_min && (job_ptr->time_min < time_limit)) - time_limit = job_ptr->time_limit = job_ptr->time_min; + time_limit = job_ptr->time_limit = job_ptr->time_min; // SET HERE later_start = now;
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Doug Jacobsen authored
Bug 6787.
-
Morris Jette authored
What can happen without this sleep is the "JOBID" output from task 0 comes after the "HOST" output from task 1, which breaks the parsing just below this.
-
- 10 Apr, 2019 14 commits
-
-
Morris Jette authored
Make sure that a job using gres/mps and gres/gpu on the same node either run at different times or use different GPUs. This restores part of test40.2 that was bad and removed in commit 5a6409bb. It also insures that the logic added in commit ac1182cf works as desired.
-
Morris Jette authored
Without this change if core binding was not specified (i.e. no cores/cpus in the gres/conf file for gpu), then gres/mps could be allocated to overlap with gres/gpu that were already allocated.
-
Tim Wickberg authored
Bug 4188.
-
Alejandro Sanchez authored
-
Albert Gil authored
Bug 6608.
-
Chad Vizino authored
-
Alejandro Sanchez authored
-
Dominik Bartkiewicz authored
Bug 6807.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
==8640== Thread 5 bckfl: ==8640== Syscall param openat(filename) points to unaddressable byte(s) ==8640== at 0x4A81D0E: open (open64.c:48) ==8640== by 0x5934ABB: _update_job_env (burst_buffer_cray.c:3338) ==8640== by 0x5934ABB: bb_p_job_begin (burst_buffer_cray.c:3962) ... ==8640== Address 0x6b96120 is 16 bytes inside a block of size 61 free'd ==8640== at 0x48369AB: free (vg_replace_malloc.c:530) ==8640== by 0x49D4873: slurm_xfree (xmalloc.c:244) ==8640== by 0x490C317: free_command_argv (run_command.c:249) ==8640== by 0x5934A5C: bb_p_job_begin (burst_buffer_cray.c:3947) ... ==8640== Block was alloc'd at ==8640== at 0x4837B65: calloc (vg_replace_malloc.c:752) ==8640== by 0x49D4566: slurm_xmalloc (xmalloc.c:87) ==8640== by 0x49D4B67: makespace (xstring.c:103) ==8640== by 0x49D4C91: _xstrcat (xstring.c:134) ==8640== by 0x49D4ECF: _xstrfmtcat (xstring.c:280) ==8640== by 0x593497C: bb_p_job_begin (burst_buffer_cray.c:3936) ... Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Ben Roberts authored
Changed the behavior of "scontrol reboot" to require the user to specify the nodes to reboot rather than defaulting to ALL. Bug 6465
-