- 24 Oct, 2016 1 commit
-
-
Dorian Krause authored
This commit fixes a bug in the multi-prog handling. When running salloc -N 2 srun -O --multi-prog mp.conf where mp.conf reads 0-192 true srun crashes can be observed. valgrind reports: ==6857== Invalid read of size 4 ==6857== at 0x45938D: bit_realloc (bitstring.c:189) ==6857== by 0x5977A9: _update_task_mask (multi_prog.c:335) ==6857== by 0x597A5E: _validate_ranks (multi_prog.c:403) ==6857== by 0x597D1E: verify_multi_name (multi_prog.c:469) ==6857== by 0x6E7B4BE: launch_p_handle_multi_prog_verify (launch_slurm.c:453) ==6857== by 0x58A25D: launch_g_handle_multi_prog_verify (launch.c:493) ==6857== by 0x58E556: _opt_args (opt.c:1927) ==6857== by 0x58A3B9: initialize_and_process_args (opt.c:270) ==6857== by 0x591F82: init_srun (srun_job.c:459) ==6857== by 0x427E70: srun (srun.c:193) ==6857== by 0x428E23: main (srun.wrapper.c:17) ==6857== Address 0x5ace440 is 16 bytes inside a block of size 28 free'd ==6857== at 0x4C2BB4A: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==6857== by 0x446886: slurm_xrealloc (xmalloc.c:139) ==6857== by 0x45944C: bit_realloc (bitstring.c:191) ==6857== by 0x5977A9: _update_task_mask (multi_prog.c:335) ==6857== by 0x597A5E: _validate_ranks (multi_prog.c:403) ==6857== by 0x597D1E: verify_multi_name (multi_prog.c:469) ==6857== by 0x6E7B4BE: launch_p_handle_multi_prog_verify (launch_slurm.c:453) ==6857== by 0x58A25D: launch_g_handle_multi_prog_verify (launch.c:493) ==6857== by 0x58E556: _opt_args (opt.c:1927) ==6857== by 0x58A3B9: initialize_and_process_args (opt.c:270) ==6857== by 0x591F82: init_srun (srun_job.c:459) ==6857== by 0x427E70: srun (srun.c:193)
-
- 20 Oct, 2016 4 commits
-
-
Tim Wickberg authored
_select_nodes_parts() was resetting state_reason to an admin hold without regard to admin vs user hold state. state_reason is the only place that user vs. admin is distinguished, so this prevented users from releasing these jobs. Bug introduced by commit fb46c84b in 16.05.5. Bug 3197.
-
Tim Wickberg authored
-
Danny Auble authored
-
Danny Auble authored
This is an addition to commit cb7ed937
-
- 19 Oct, 2016 1 commit
-
-
Ole H Nielsen authored
bug 3191
-
- 18 Oct, 2016 2 commits
-
-
Dominik Bartkiewicz authored
Improve reported estimates of start and end times for pending jobs. bug 3184
-
Morris Jette authored
Cray: Prevent abort in backfill scheduling logic for requeued job that has been cancelled while NHC is running. bug 3185
-
- 17 Oct, 2016 4 commits
-
-
Morris Jette authored
Modify DataWarb example to use an environment variable rather than absolute path
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
new glibc 2.24+ that depricates readdir_r.
-
- 15 Oct, 2016 1 commit
-
-
Morris Jette authored
-
- 14 Oct, 2016 4 commits
-
-
Tim Wickberg authored
This reverts commit 8bc8e7e6.
-
Morris Jette authored
The slurm_strcasestr() which existed in v16.05 was removed in v17.02.
-
Morris Jette authored
Found by Coverity
-
Morris Jette authored
Fix for possibly treating a negative number as a positive. Problem reported by Coverity.
-
- 13 Oct, 2016 8 commits
-
-
Morris Jette authored
-
Morris Jette authored
Change the default syscfg path to /usr/bin/syscfg Don't load the plugin if ResumeProgram is configured Update documentation on web page
-
Morris Jette authored
-
Morris Jette authored
This applies only to knl_generic plugin. The original node active features should be the same as the features field at startup. The logic was assuming the original value was NULL.
-
Morris Jette authored
This was a new function added to support knl_generic and it was originally assigned the wrong name.
-
Morris Jette authored
Added node_features/knl_generic plugin for KNL support on non-Cray systems. NOTE: This plugin is still under development.
-
Morris Jette authored
Do not propagate SLURM_UMASK environment variable to batch script. bug 2609
-
Bjørn-Helge Mevik authored
Correct a bitmap test function (used only by the select/bluegene plugin). The effect of this bug is probably very limited as it will in almost all cases revert prematurely to a bit-by-bit test rather than using a full-word test. bug 3145
-
- 12 Oct, 2016 11 commits
-
-
Tim Wickberg authored
Cannot use ClusterName without reading a config file that may not exist. Bug 3026.
-
Tim Wickberg authored
This introduced an inadvertent dependency on the config file, which does not exist when setting up a new cluster. Bug 3026. This reverts commit c39f9ac9.
-
Tim Wickberg authored
-
Morris Jette authored
task/affinity plugin: Honor a job's --ntasks-per-socket and --ntasks-per-core options in task binding. bug 3118
-
Pär Lindfors authored
-
Brian Christiansen authored
Changed in df70b651
-
Brian Gilmer authored
-
Morris Jette authored
Preserve non-KNL node features when updating the KNL node features for a multi-node job in which the non-KNL node features vary by node.
-
Morris Jette authored
node_features/knl_cray plugin: If the reconfiguration of nodes for an interactive job fails, kill the job (it can't be requeued like a batch job).
-
Morris Jette authored
Execute "capmc node_status" at more frequent intervals to handle nodes getting added or removed from the system using Cray tools (i.e. try to keep Slurm and Cray software better synchronized).
-
Morris Jette authored
node_features/knl_cray plugin: Add separate thread to interact with capmc in response to unexpected node reboots. bug 3153
-
- 11 Oct, 2016 4 commits
-
-
Alejandro Sanchez authored
bug 3091
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
bug 3155
-