- 09 Jan, 2017 6 commits
-
-
Morris Jette authored
backfill scheduler: Stop trying to determine expected start time for a job after 2 seconds of wall time. This can happen if there are many running jobs and a pending job can not be started soon. byg 3373
-
Tim Wickberg authored
This reverts commit 17549a03.
-
Dominik Bartkiewicz authored
Bug 3364.
-
Tim Shaw authored
Configuring slurm with munge manually installed in /usr/local, with the library in /usr/local/lib but an empty /usr/local/lib64 directory will cause the munge plugins to look for libmunge.so in the wrong place. The munge.spec file has historically provided libmunge.so as part of munge-devel, which Slurm depends on already.
-
Morris Jette authored
Add SchedulerParameters configuration parameter of "default_gbytes", which treats numeric only (no suffix) value for memory and tmp disk space as being in units of Gigabytes. Mostly for compatability with LSF.
-
Morris Jette authored
Move BatchScript to end of each job's information when using "scontrol -dd show job" to make it more readable.
-
- 06 Jan, 2017 1 commit
-
-
Tim Wickberg authored
Can cause random assertion failures and core dumps due to differences between definition in slurm.h and bitstring.h. Inadverently introduced in 8967a4e7.
-
- 05 Jan, 2017 2 commits
-
-
Alejandro Sanchez authored
17.02 API has been changed so that node Port parameter is now packed and unpacked on REQUEST_NODE_INFO RPC. Some client requests such as 'scontrol write config', 'scontrol show node' will display the port if different to SlurmdPort. Port parameter is also available now to 'sinfo' if explicitly requested through '-O port' and to the 'sview' full node info. Always send SlurmdPort in RPC even when in multiple-slurmd mode. Bug 3240.
-
Doug Jacobsen authored
Bug 3376.
-
- 04 Jan, 2017 5 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. (This commit is slightly different from the fix to the 15.08 branch.) CVE-2016-10030.
-
Tim Wickberg authored
-
Tim Wickberg authored
Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030.
-
- 03 Jan, 2017 3 commits
-
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
Prevent "stray" jobs from using resources when the srun/salloc will never launch the actual compute tasks. Bug 3344.
-
Dominik Bartkiewicz authored
PluginDir is allowed to be a PATH-style list of directories; remove incorrect test of the variable as if it were a single directory and comment that the check for that is elsewhere. Bug 3361.
-
- 29 Dec, 2016 3 commits
-
-
Morris Jette authored
Add SchedulerParameters option of spec_cores_first to select specialized cores from the lowest rather than highest number cores and sockets. bug 3349
-
Dominik Bartkiewicz authored
Null terminate before strchr().
-
Morris Jette authored
This is a new message when "PrologFlags=contain" or "PrologFlags=alloc" is configured. bug 3351
-
- 28 Dec, 2016 1 commit
-
-
Alejandro Sanchez authored
Cancel interactive job if Prolog failure with "PrologFlags=contain" configured. bug 3351
-
- 21 Dec, 2016 1 commit
-
-
Morris Jette authored
Do not allocate specialized cores to jobs using the --exclusive option. bug 3349
-
- 19 Dec, 2016 1 commit
-
-
Morris Jette authored
Fix memory and file descriptor leaks in slurmd daemon's sbcast logic.
-
- 16 Dec, 2016 4 commits
-
-
Danny Auble authored
The part_ptr is sent into the function, there is no reason to look it up again. Coverity reported this.
-
Tim Wickberg authored
-
Tim Wickberg authored
Remove req_node_layout, which was only used with wiki/wiki2. This leads to removal of _get_cpu_cnt() as well. Remove SchedulerPort, only used for communication to Moab/Maui. Remove slurm_get_sched_port() from API. Remove schedport from slurm_ctl_conf struct.
-
Morris Jette authored
bug 2161
-
- 15 Dec, 2016 5 commits
-
-
Morris Jette authored
burst_buffer/cray - Remove leading zeros from node ID lists passed to dw_wlm_cli program. bug 3008
-
Danny Auble authored
version is lower than the min version, set it to the min. Bug 3050
-
Morris Jette authored
bug 1752
-
Morris Jette authored
sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. bug 3346
-
Danny Auble authored
go into JobAdminHeld. Bug 3201
-
- 14 Dec, 2016 4 commits
-
-
Morris Jette authored
Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. bug 3329
-
Tim Wickberg authored
Bug 2992.
-
Morris Jette authored
Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. bug 3273
-
Morris Jette authored
node_features/knl_generic - Add capability to detected Uncorrectable Memory Errors (UME) and if detected then log the event in all job and step stderr with a message of the form: error: *** STEP 1.2 ON tux1 UNCORRECTABLE MEMORY ERROR AT 2016-12-14T09:09:37 *** bug 3341
-
- 13 Dec, 2016 1 commit
-
-
Tim Wickberg authored
Reverts most of commit 84023f27. Searching the PATH in slurmd can fail due to root_squash'd NFS filesystems, leading to the "wrong" program being launched. If you'd like the performance benefit from avoiding this lookup during each separate task launch, set SLURM_TEST_EXEC=1 instead which will perform the lookup once within srun, which then ensures the lookup happens under the users own environment and not that of the slurmd. Bug 2992.
-
- 09 Dec, 2016 2 commits
-
-
Danny Auble authored
level.
-
Morris Jette authored
Provide limited support for the MemSpecLimit configuration parameter without the task/cgroup plugin.
-
- 08 Dec, 2016 1 commit
-
-
Danny Auble authored
-