- 26 Apr, 2016 7 commits
-
-
Danny Auble authored
-
Danny Auble authored
restart of the slurmctld.
-
Morris Jette authored
On some systems the char_to_val was not being put into the plugin, resulting in the following error: slurmstepd: [23.0]: symbol lookup error: /home/jette/SLURM/install_smd/lib/slurm/task_cgroup.so: undefined symbol: char_to_val The problem was fixed by declaring the function "static". The function was name was also updated with a leading "_" to indicate the function is local to that module.
-
Danny Auble authored
-
René Genz authored
-
Tim Wickberg authored
-
Sam Gallop authored
Otherwise miscalculated limit will lead to job cancellation even when well inside the allocated amount. Bug 2660.
-
- 23 Apr, 2016 1 commit
-
-
Tim Wickberg authored
in the slurmdbd segfaulting. Bug 2656
-
- 20 Apr, 2016 2 commits
-
-
Morris Jette authored
burst_buffer/cray - Don't call Datawarp "paths" function if script includes only create or destroy of persistent burst buffer. Some versions of Datawarp software return an error for such scripts, causing the job to be held. bug 2624
-
Morris Jette authored
No change in any logic or definitions
-
- 15 Apr, 2016 1 commit
-
-
Morris Jette authored
-
- 14 Apr, 2016 1 commit
-
-
Morris Jette authored
If a job fails stage in, set its reason to BurstBufferOperation with a string describing what happened. Previously the reason was set to AdminHeld on stage-in failure.
-
- 13 Apr, 2016 2 commits
-
-
Morris Jette authored
-
Danny Auble authored
that wasn't set up correctly.
-
- 12 Apr, 2016 3 commits
-
-
Morris Jette authored
power/cray - Fix bug introduced in 15.08.10 preventin operation in many cases. bug 2628
-
Morris Jette authored
-
Morris Jette authored
Was printing integer using %u format
-
- 11 Apr, 2016 7 commits
-
-
Morris Jette authored
burst_buffer/cray - Fix for script creating or deleting persistent buffer would fail "paths" operation and hold the job. bug 2624
-
Danny Auble authored
and it doesn't meet basic requirements.
-
Tim Wickberg authored
-
Morris Jette authored
The gprof tool is showing most time is being consumed by the bit_test() function as called from the select plugin, which in turn was called by the backfill scheduler. These changes replace the for loop end-points. Previous logic tested for all possible nodes. The new logic identifes the first and last bit set in the node bitmap and uses those end-points instead. Node the logic to find the first and last bits set starts off with a word-based search (testing for a 64-bit zero value rather than testing each individual bit). The net result is a small performance improvement. bug 2588
-
Tim Wickberg authored
-
Morris Jette authored
burst_buffer/cray - Decrement job's prolog_running counter if pre_run fails. bug 2621
-
Morris Jette authored
If a job is no longer in configuring state, then clear the prolog_running counter on slurmctld restart or reconfigure. bug 2621
-
- 09 Apr, 2016 2 commits
-
-
Morris Jette authored
For case where job can't start and there are no running jobs to remove in order to establish estimated start time.
-
Morris Jette authored
When determining when a pending job will be able to start, rather than testing after removing each running job and trying to schedule the pending jobs, remove multiple jobs that all end about the same time before testing. This reduces the number of calls to the job placement logic, which is time consuming.
-
- 08 Apr, 2016 2 commits
-
-
Morris Jette authored
list_peek_next(), like list_next() but WITHOUT advancing the pointer
-
Morris Jette authored
-
- 07 Apr, 2016 3 commits
-
-
Morris Jette authored
Document and log cases where max jobs per user or partition is equal or greater than the max jobs test. In that case, a single user can easily stop all backfill scheduling.
-
Sami Ilvonen authored
-
Morris Jette authored
Fix for job "--contiguous" option that could cause job allocation/launch failure or slurmctld crash. bug 2573
-
- 06 Apr, 2016 8 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
This reverts commit f559a55c.
-
Danny Auble authored
constraints mattered in a job. Details include: A job doesn't request memory but the system is running with CR_*MEMORY with no default memory limit and the job requests nodes with features of different sizes. Previously the order of constraints mattered where the smaller memory node would need to be requested first or the job would fail. Bug 2608
-
Morris Jette authored
Previous logic would get an account and/or QOS time limit and use that value to overwrite the incoming RPC's NO_VAL value, which would change a job's time limit when changing an unrelated field (e.g. priority, QOS, etc.). bug 2610
-
Danny Auble authored
-
Morris Jette authored
Prevent use of NULL pointer and SEGV when changing a job's QOS when the slurmdbd is not configured.
-
Tim Wickberg authored
-
- 05 Apr, 2016 1 commit
-
-
Morris Jette authored
Fix backfill scheduler race condition that could cause invalid pointer in select/cons_res plugin. Bug introduced in 15.08.9, commit: efd9d35e The scenario is as follows 1. Backfill scheduler is running, then releases locks 2. Main scheduling loop starts a job "A" 3. Backfill scheduler resumes, finds job "A" in its queue and resets it's partition pointer. 4. Job "A" completes and tries to remove resource allocation record from select/cons_res data structure, but fails to find it because it is looking in the table for the wrong partition. 5. Job "A" record gets purged from slurmctld 6. Select/cons_res plugin attempts to operate on resource allocation data structure, finds pointer into the now purged data structure of job "A" and aborts or gets SEGV Bug 2603
-