- 17 Feb, 2016 5 commits
-
-
Morris Jette authored
Previous logic was failing if feature name not found
-
Morris Jette authored
Add PID to both the slumctld code and cray capmc_suspend/resume programs
-
Morris Jette authored
-
Morris Jette authored
Previous logic would cause slurmctld shutdown to wait for completion of all power save (suspend and resume) programs according to their time limits, which could be huge. The new logic waits up to 10 second then orphans the processes.
-
Morris Jette authored
-
- 16 Feb, 2016 6 commits
-
-
Morris Jette authored
The parsing of the configuration parameter failed with a prefix of "node_features" due to vestigial logic for a plugin type of "knl".
-
Morris Jette authored
Was trying to set NUMA mode as MCNUMA and vise-versa Also change capmc to specify mode before nids, which seems more robust.
-
Tim Wickberg authored
-
Morris Jette authored
the "fputs" function was aborting, trying to write a NULL string pointer. Also found the log message was printing the configured and read names in the wrong order.
-
Tim Wickberg authored
abort() rather than continue if pthread_mutex_ calls fail. better to die early rather than continue on and risk corruption. mirrors the (now removed) macro definitions from cbuf/hostlist/list.
-
Morris Jette authored
If job submit time is right at a second boundary, the test could fail TEST: 15.37 spawn /home/jette/SLURM/install_smd/bin/salloc --begin now+60 --deadline now+600 --time-min 10 sleep 1^M salloc: error: Job submit/allocate failed: Requested time limit is invalid (missing or exceeds some limit)^M FAILURE: batch not submitted with a deadline too short test15.37 FAILURE [2016-02-12T15:53:53.005] _valid_job_part: job's min_time greater than deadline (10 > 2016-02-12T16:03:52) [2016-02-12T15:53:53.005] _slurm_rpc_allocate_resources: Requested time limit is invalid (missing or exceeds some limit)
-
- 13 Feb, 2016 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
These are very unlikely to ever occur, but this helps harden the code.
-
- 12 Feb, 2016 10 commits
-
-
Brian Christiansen authored
-
Danny Auble authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Morris Jette authored
Add logic to read current KNL state information using Intel's syscfg command on the KNL node.
-
Tim Wickberg authored
Calling _init() on the plugins can have side-effects, and is not safe. E.g., select/bluegene, switch/nrt, */cray, all make calls to external APIs that could cause unexpected problems. This reverts commit 19fc9d94. Conflicts: NEWS
-
Morris Jette authored
None were real errors, but these changes do harden the code.
-
Tim Wickberg authored
-
Tim Wickberg authored
Compare the saved clustername to slurmctld's, if there's a mismatch prevent slurmctld from starting to avoid corruption from separate clusters attempting to share a single state directory. Bug 2433.
-
- 11 Feb, 2016 11 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Tim Wickberg authored
Include <sys/param.h>; don't use a local MAX_PATH_LEN macro.
-
Morris Jette authored
-
Morris Jette authored
3 in node_features work, 1 in new gid cache
-
Morris Jette authored
-
Tim Wickberg authored
Non-GLIBC LIBC's are strict about when these functions are visible.
-
Tim Wickberg authored
-
Morris Jette authored
-
Morris Jette authored
Rather than keeping the program alive forever due to a bad count Also eliminate a memory free before its use is done
-
Morris Jette authored
Conflicts: src/slurmctld/read_config.c
-
- 10 Feb, 2016 4 commits
-
-
Morris Jette authored
chnage from 120 to 60 seconds. If it does not end in time, log it and orphan the process. For Cray systems with KNL, process boot times are likely to be ~20 minutes.
-
Morris Jette authored
-
Morris Jette authored
Eliminate duplicate code (put into function) and only power down node when needed.
-
Morris Jette authored
-