- 12 Apr, 2019 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well.
-
- 11 Apr, 2019 7 commits
-
-
Morris Jette authored
Coverity CID 197586
-
Morris Jette authored
-
Morris Jette authored
This bug was observed once running test1.103 and found to be reproducible under some circumstances. How this works is when a job is submitted with a --time-min value, then the backfill scheduler set's the job's time limit to that value (see below), tries to start it, and if successful then set the time limit to the largest value possible without delaying the expected start time of any higher priority jobs. If the job can't be started, the job's (maximum) time limit is supposed to be restored to its previous value. That was happening in some, but not all places in the code. This patch rests the time limit in the missing cases. Note: The job's time limit is set to the --time-min value here: diff --git a/src/plugins/sched/backfill/backfill.c b/src/plugins/sched/backfill/backfill.c index 5600495e1d..e22a810394 100644 --- a/src/plugins/sched/backfill/backfill.c +++ b/src/plugins/sched/backfill/backfill.c @@ -1930,7 +1930,7 @@ next_task: slurm_get_preempt_mode()) time_limit = job_ptr->time_limit = 1; else if (job_ptr->time_min && (job_ptr->time_min < time_limit)) - time_limit = job_ptr->time_limit = job_ptr->time_min; + time_limit = job_ptr->time_limit = job_ptr->time_min; // SET HERE later_start = now;
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Doug Jacobsen authored
Bug 6787.
-
Morris Jette authored
What can happen without this sleep is the "JOBID" output from task 0 comes after the "HOST" output from task 1, which breaks the parsing just below this.
-
- 10 Apr, 2019 17 commits
-
-
Morris Jette authored
Make sure that a job using gres/mps and gres/gpu on the same node either run at different times or use different GPUs. This restores part of test40.2 that was bad and removed in commit 5a6409bb. It also insures that the logic added in commit ac1182cf works as desired.
-
Morris Jette authored
Without this change if core binding was not specified (i.e. no cores/cpus in the gres/conf file for gpu), then gres/mps could be allocated to overlap with gres/gpu that were already allocated.
-
Tim Wickberg authored
Bug 4188.
-
Alejandro Sanchez authored
-
Albert Gil authored
Bug 6608.
-
Chad Vizino authored
-
Alejandro Sanchez authored
-
Dominik Bartkiewicz authored
Bug 6807.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
==8640== Thread 5 bckfl: ==8640== Syscall param openat(filename) points to unaddressable byte(s) ==8640== at 0x4A81D0E: open (open64.c:48) ==8640== by 0x5934ABB: _update_job_env (burst_buffer_cray.c:3338) ==8640== by 0x5934ABB: bb_p_job_begin (burst_buffer_cray.c:3962) ... ==8640== Address 0x6b96120 is 16 bytes inside a block of size 61 free'd ==8640== at 0x48369AB: free (vg_replace_malloc.c:530) ==8640== by 0x49D4873: slurm_xfree (xmalloc.c:244) ==8640== by 0x490C317: free_command_argv (run_command.c:249) ==8640== by 0x5934A5C: bb_p_job_begin (burst_buffer_cray.c:3947) ... ==8640== Block was alloc'd at ==8640== at 0x4837B65: calloc (vg_replace_malloc.c:752) ==8640== by 0x49D4566: slurm_xmalloc (xmalloc.c:87) ==8640== by 0x49D4B67: makespace (xstring.c:103) ==8640== by 0x49D4C91: _xstrcat (xstring.c:134) ==8640== by 0x49D4ECF: _xstrfmtcat (xstring.c:280) ==8640== by 0x593497C: bb_p_job_begin (burst_buffer_cray.c:3936) ... Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Doug Jacobsen authored
Bug 6807.
-
Ben Roberts authored
Changed the behavior of "scontrol reboot" to require the user to specify the nodes to reboot rather than defaulting to ALL. Bug 6465
-
Morris Jette authored
This corrects the gres/mps test to insure that CUDA_VISIBLE_DEVICES is always zero (it is dependent upon the devices under MPS control and not related to cgroup constrained devices). Also correct some logic related to how the percentage calculation works in the test.
-
Morris Jette authored
Permit any GPU to be used for gres/mps mode, but only one GPU can be used
-
Morris Jette authored
a request for --gres=mps:1 (specifically with a count of one) was in some places being treated like a request for a full GPU
-
- 09 Apr, 2019 11 commits
-
-
Morris Jette authored
The variable is relative to which GPUs are managed by MPS. Currently Slurm only allows one GPU to be managed by MPS at a time, so the env var should always be zero.
-
Morris Jette authored
Allow it to support use of any GPU in multi-GPU system
-
Brian Christiansen authored
-
Brian Christiansen authored
This allows jobs to be placed on booting nodes rather than being given a whole node even if it would have been better to wait for the node boot. Bug 6782
-
Brian Christiansen authored
-
Brian Christiansen authored
to make nodes available after being suspended even if down, drain, failed. Bug 6212
-
Brian Christiansen authored
Bug 6333
-
Brian Christiansen authored
Bug 6333
-
Brian Christiansen authored
Rely on POWERING_DOWN bit. This allows POWERING_DOWN nodes to be cleared after a restart -- since suspend_node_bitmap was local to power_save.c. Bug 6333
-
Brian Christiansen authored
Bug 6333
-
Brian Christiansen authored
Instead of just guessing the time, let's use the original time. Bug 6333
-