- 16 Sep, 2016 1 commit
-
-
Morris Jette authored
node_features/knl_cray: If a node is rebooted outside of Slurm's direction, update it's active features with current MCDRAM and NUMA mode information. bug 3071
-
- 15 Sep, 2016 3 commits
-
-
Morris Jette authored
Fix race condition that could result in MCDRAM state information coming from capmc rather than cnselect (used state for next boot rather than latest boot). bug 3080
-
Tim Wickberg authored
-
Nicolas Joly authored
-
- 14 Sep, 2016 3 commits
-
-
Alejandro Sanchez authored
No functional change, just silencing the warning message in this instance. Bug 3079.
-
Tim Wickberg authored
Fix some whitespace on line endings while here.
-
Alejandro Sanchez authored
Bug 3073.
-
- 12 Sep, 2016 7 commits
-
-
-
Morris Jette authored
-
Morris Jette authored
bug 3065
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Add Slurm overview by Alex on first day, move KNL to second morning, shift roadmap to right after lunch.
-
Morris Jette authored
bug 3065
-
- 09 Sep, 2016 7 commits
-
-
Morris Jette authored
Previous cap was 2 sec (default TCP timeout) times the node count and divided by 1000. A 9000 node job would have the messages spread out over 18 seconds. This change caps the spread at 5 seconds and assumes the normal TCP logic can handle the rest bug 3044
-
Morris Jette authored
If the overhead of determining the hostlist for a given task list is too high, then report a hostlist of "Unknown" instead. If the overhead is too high, then srun will become unresponsive and communications will timeout/fail. bug 3044
-
Morris Jette authored
-
Morris Jette authored
Modify srun task completion handling to only build the task/node string for logging purposes if it is needed. Modified for performance purposes. bug 3044
-
Morris Jette authored
Add get_log_level() function to return the highest LOG_LEVEL_* used for any logging mechanism.
-
Tim Wickberg authored
This reverts commit 1ec2a4ae.
-
Alejandro Sanchez authored
Bug 3063.
-
- 08 Sep, 2016 2 commits
-
-
Morris Jette authored
Restructure srun command locking for task_exit processing logic for improved parallelism. This change decreases the amount of time consumed by serial logic by 2 orders of magnitude. bug 3044
-
Morris Jette authored
-
- 07 Sep, 2016 3 commits
-
-
Morris Jette authored
Preserve node "RESERVATION" state when one of multiple overlapping reservations ends. Previous logic would clear the node's RESERVATION state flag when any one of the reservations on the node ended rather than keeping the node in RESERVATION state until the last reservation ended. bug 3057
-
Morris Jette authored
The logic is now heavier weight, so increase interval between tests from 2 to 5 seconds
-
Morris Jette authored
Handle case when slurmctld daemon restart while compute node reboot in progress. Return node to service rather than setting DOWN. bug 3042
-
- 06 Sep, 2016 4 commits
-
-
Morris Jette authored
Add salloc_wait_nodes option to the SchedulerParameters parameter in the slurm.conf file controlling when the salloc command returns in relation to when nodes are ready for use (i.e. booted). bug 3043
-
E Kawashima authored
-
Gennaro Oliva authored
bug 3055
-
Gennaro Oliva authored
bug 3054
-
- 02 Sep, 2016 3 commits
-
-
Danny Auble authored
before all was moved into a common location in common.c.
-
Danny Auble authored
reservations.
-
Brian Christiansen authored
-
- 01 Sep, 2016 4 commits
-
-
Morris Jette authored
sched/backfill - Check that a user's QOS is allowed to use a partition before trying to schedule resources on that partition for the job. bug 3039
-
Morris Jette authored
-
Morris Jette authored
bug 3035 and 3009
-
David Gloe authored
This reverts commit 933d4fba.
-
- 30 Aug, 2016 3 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Conflicts: src/plugins/select/cray/select_cray.c
-
Tim Wickberg authored
Otherwise blade_cnt is potentially greater than bit_size(jobinfo->blade_map) which leads to an assertion failure. Bug 3033.
-