- 09 Nov, 2016 3 commits
-
-
Tim Wickberg authored
-
Morris Jette authored
Set per-node HBM availability as a GRES based upon the KNL node's MCDRAM state bug 3171
-
Alejandro Sanchez authored
Caused by race for local_energy which is dynamically allocated. Bail out of the update if that hasn't been allocated yet. Bug 3237.
-
- 08 Nov, 2016 5 commits
-
-
Morris Jette authored
bug 3213
-
Morris Jette authored
select/linear plugin modified to better support heterogeneous clusters when topology/none is also configured. Note that use of the select/cons_res plugin is strongly recommended for heterogeneous clusters. The use of OverSubscribe=exclusive can be used if whole node allocations is desired. bug 3212
-
Alejandro Sanchez authored
Bug 3224.
-
Morris Jette authored
-
Morris Jette authored
If a job is started by the main scheduling logic and requeued while the backfill scheduler has locks released, that can result in an invalid data structure in select/cons_res. Namely, the backfill scheduler's attempt to start the job would clear the job resources node_bitmap. That leaves a NULL pointer in the select/cons_res plugin generating an abort. (That pointer is needed to clean up the job allocation records when the Epilog or Cray Node Health Check, NHC, are complete and the resources become available for another job. bug 3230
-
- 07 Nov, 2016 1 commit
-
-
Morris Jette authored
Backup slurmctld will now 1. Not abort due to NULL pointer (needed to move code around on restart) 2. Recover KNL MCDRAM and NUMA modes from state save files if capmc and cnselect not available bug 3241
-
- 05 Nov, 2016 1 commit
-
-
Morris Jette authored
cray/burst_buffer - Update "instance" parsing to match updated dw_wlm_cli output. bug 3222
-
- 04 Nov, 2016 7 commits
-
-
Morris Jette authored
Expand the dw_wlm_cli script to include persistent and job-specific burst buffers. This script is used by burst_buffer/cray.
-
Morris Jette authored
This is a new field and the fix only applies to an emulated burst buffer configuration (i.e. dw_wlm_cli script made to look like a real DataWarp system)
-
Morris Jette authored
Change error() to verbose(). New logic is needed to address this issue once we know how to determine the KNL MCDRAM size.
-
Morris Jette authored
-
Morris Jette authored
cray/burst_buffer - Preserve job ID and don't translate to job array ID after slurmctld restart. Prior logic would not set array_task_id to NO_VAL, so all job-buffer IDs would be reported in the form "JobID=0_0(123)" rather than "JobID=123"
-
Morris Jette authored
cray/busrt_buffer - Internally track both allocated and unusable space. The reported UsedSpace in a pool is now the allocated space (previously was unusable space). Base available space on whichever value leaves least free space. bug 3222
-
Tim Wickberg authored
Previously disconnected from build system, and most code removed by commit 0b14a3a7 back on 15.08-pre1.
-
- 03 Nov, 2016 8 commits
-
-
Tim Wickberg authored
We don't build on Tru-64, and there are a lot more platform-dependent pieces of code within Slurm than we've indicated here.
-
Tim Wickberg authored
-
Tim Wickberg authored
Remove stray Bull logo as well.
-
Tim Wickberg authored
Remove stray Bull logo as well.
-
Tim Wickberg authored
They're full of dead links, and the plugins are deprecated (announced at SLUG16).
-
Tim Wickberg authored
-
Tim Wickberg authored
OSX is not currently supported, and the build is likely broken due to differences in dynamic library loading.
-
Tim Wickberg authored
-
- 01 Nov, 2016 4 commits
-
-
Danny Auble authored
and request --ntasks-per-core=1 and only 1 task on the node the slurmd would abort on an infinite loop fatal. Regression is from commit 5265420d. Without this fix you can get into an infinite loop in the task/affinity plugin. The loop is handled by producing a fatal. Bug 3118
-
Morris Jette authored
cray/busrt_buffer - Fix for double counting of used_space at slurmctld startup. bug 3222
-
Joseph Mingrone authored
Add the POLLRDHUP hack used elsewhere to work around non-standard flag use. Bug 3227.
-
Morris Jette authored
cray/busrt_buffer - If total_space in a pool decreases, reset used_space rather than trying to account for buffer allocations in progress. bug 3222
-
- 28 Oct, 2016 1 commit
-
-
Danny Auble authored
more time than should be allowed would be accounted for. This only happened on jobs in the completing state when the slurmctld was shutdown. This will also be enhanced in 17.02 as the job's end_time_exp is not stored which is needed to determine if the job has already been through the decay_thread at end of job. Bug 3162
-
- 27 Oct, 2016 8 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
issue with gang scheduling. Bug 3211
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Morris Jette authored
This option specifies minimum characteristics of the compute nodes which should be considered for use, not the resource allocation size. bug 3118
-
Alejandro Sanchez authored
Create separate check_hosts_contiguous procedure in globals and use it for both test1.83 and test15.21. Bug 3006.
-
Morris Jette authored
-
- 26 Oct, 2016 2 commits
-
-
Morris Jette authored
Fix bug that was clearing MAINT mode on nodes scheduled for reboot (bug introduced in version 16.05.5 to address bug in overlapping reservations, commit 5eee1d28). Note that a node's MAINT flag is used for both a requested reboot and maintenance reservation. What I'd like to do is add a new node state flag to differenciate between these two cases, but that involves some significant changes that could introduce instability, so it will be defered to version 17.02 bug 3210
-
Morris Jette authored
Correct/expand description of NODE_STATE_FLAG
-