- 24 Oct, 2017 2 commits
-
-
Brian Christiansen authored
Bug 4246
-
Alejandro Sanchez authored
Down waking nodes right after ResumeTimeout has been reached if they are not responding. Otherwise we have to wait for ping_nodes() to handle this work, thus SlurmdTimeout comes into play giving the sensation to the end user that nodes got stuck in ALLOCATED# and job in CF state until ping_nodes() decides to mark them DOWN and requeue the job. Bug 4182
-
- 19 Oct, 2017 5 commits
-
-
Morris Jette authored
Update to commit 859f6c82
-
Tim Wickberg authored
-
Dominik Bartkiewicz authored
E.g., gpu:tesla:2 would have parsed as quantity "telsa" of gpu. Change the index value to the end of the array (-1 index value). Bug 4250.
-
Dominik Bartkiewicz authored
Rather than end up with "%.-1s" printed out in the output as snprintf refused to parse the format specifier. Bug 4164.
-
Felip Moll authored
bugzilla #4238 - Added a hint for the window manager to all popups and windows in order to get max, min, close buttons in Gnome, XFCE, and others.
-
- 18 Oct, 2017 1 commit
-
-
Danny Auble authored
Bug 4244
-
- 14 Oct, 2017 1 commit
-
-
Josh Samuelson authored
Pending job with administrator extended TimeLimit beyond partition's MaxTime remains pending with reason PartitionTimeLimit bug 4262
-
- 13 Oct, 2017 5 commits
-
-
Morris Jette authored
-
Brian Christiansen authored
The contoller will return node records with a NULL name for nodes that are hidden. This is so that you can map a partition_info's nodes -- using it's node_inx[] -- to a node record in the returned node_array from slurm_load_node(). Previously the perl api would leave an undefined object in the the node_array if the hidden nodes were found before a real node in the node_array and any hidden nodes at the end of the array from the controller wouldn't be counted for in the perl node_array. This patch adds empty hashes for hidden nodes and preserves the record_count and node_array from the slurmctld. Bug 4250
-
Brian Christiansen authored
The perl api leaves undefined objects in the node_array returned by load_nodes() for any node that is hidden. But 4250
-
Morris Jette authored
Bug 4003
-
Morris Jette authored
as process exits Bug 4003
-
- 10 Oct, 2017 4 commits
-
-
Brian Christiansen authored
when using xstrcasecmp. Matching up with other xstrcmp() functions.
-
Brian Christiansen authored
was missing
-
Isaac Hartung authored
Bug 4226
-
Tim Wickberg authored
that there was no bit_fmt was out of scope on the xfree. Passing a function address to xfree() predictably does not work very well. Change the variable name to avoid confusion. Bug 4241
-
- 05 Oct, 2017 1 commit
-
-
Brian Christiansen authored
Before: $ sbatch --wrap="sleep 300" Submitted batch job 228 $ squeue JOBID PARTITION NAME USER ST TIME CPUS NODELIST(REASON) 228 debug wrap brian PD 0:00 1 (AssocMaxUnknownPerNode) Fixed: $ squeue JOBID PARTITION NAME USER ST TIME CPUS NODELIST(REASON) 229 debug wrap brian PD 0:00 1 (AssocMaxCpuPerNode) $ sacctmgr mod account stuff set maxtrespernode=cpu=-1,mem=1 $ squeue JOBID PARTITION NAME USER ST TIME CPUS NODELIST(REASON) 229 debug wrap brian PD 0:00 1 (AssocMaxMemPerNode) $ sbatch --wrap="sleep 300" --gres=blah:2 -pgpu Submitted batch job 235 $ squeue JOBID PARTITION NAME USER ST TIME CPUS NODELIST(REASON) 235 gpu wrap brian PD 0:00 1 (AssocMaxGRESPerNode)
-
- 04 Oct, 2017 1 commit
-
-
Morris Jette authored
burst_buffer/cray plugin modified to work with changes in Cray UP06 software release. Specific changes: Cray software now returns an error if a state_in or stage_out script is processed that doesn't actually request a stage in or out (previously silently ignored). Also the warning message about tearing down a buffer that is already gone changed.
-
- 02 Oct, 2017 2 commits
-
-
Dominik Bartkiewicz authored
Move the check up a bit more where it'll do some good. Bug 4184.
-
Dominik Bartkiewicz authored
Bug 4146.
-
- 29 Sep, 2017 2 commits
-
-
Danny Auble authored
Bug 3467
-
Danny Auble authored
Bug 3567
-
- 27 Sep, 2017 2 commits
-
-
Danny Auble authored
gres listed in your slurm.conf but some in gres.conf. Bug 3974
-
Danny Auble authored
'type' but no file defined.
-
- 19 Sep, 2017 3 commits
-
-
Danny Auble authored
plugin when constraining devices.
-
Danny Auble authored
-
Danny Auble authored
correctly in sacct.
-
- 14 Sep, 2017 1 commit
-
-
Tim Wickberg authored
A second PMI2_Init() within the same step is invalid, and cannot succeed. Return an error code back to the client end, and close the fd to force the step to terminate immediately. Due to a bug in our libpmi code, just returning a cmd=response_to_init with an appropriate rc number will not tear down the connection properly, so send back something else that will trigger the error path. Bug 3520.
-
- 13 Sep, 2017 1 commit
-
-
Josh Samuelson authored
Bug 4154.
-
- 12 Sep, 2017 3 commits
-
-
Danny Auble authored
default path. This makes it so you don't always have to put AllowedDevicesFile in your cgroup.conf file if your etc dir is anything other than /etc/slurm.
-
Tim Wickberg authored
Adding a newline prevents this error: conftest.c:154:8: error: if statement has empty body [-Werror,-Wempty-body]
-
Alejandro Sanchez authored
remote cluster correctly determine the select type. Bug 2329
-
- 08 Sep, 2017 2 commits
-
-
Dominik Bartkiewicz authored
If /proc was inaccessible proc_name would leak. Put an explicit length cap in sprintf to avoid warning. The size is checked immediate before here so this is just making the 10-char limit explicit. Bug 4062.
-
Dominik Bartkiewicz authored
Bug 4062.
-
- 07 Sep, 2017 2 commits
-
-
Dominik Bartkiewicz authored
bug 3824
-
Morris Jette authored
Do not run the Node Health Check on termination of the external step as this happens when the job allocation ends and the job NHC will be executed anyway. Bug 4074
-
- 01 Sep, 2017 2 commits
-
-
Danny Auble authored
checked on submit. This only mattered when submitting a job to multiple partitions. Bug 4066
-
Danny Auble authored
on node 0. Bug 4035
-