- 08 Sep, 2017 3 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Bug 3921.
-
Dominik Bartkiewicz authored
Bug 4062.
-
- 07 Sep, 2017 2 commits
-
-
Dominik Bartkiewicz authored
bug 3824
-
Morris Jette authored
Do not run the Node Health Check on termination of the external step as this happens when the job allocation ends and the job NHC will be executed anyway. Bug 4074
-
- 06 Sep, 2017 9 commits
-
-
Brian Christiansen authored
-
Isaac Hartung authored
instead of waiting till end of script.
-
Isaac Hartung authored
-
Isaac Hartung authored
-
Nathan Yee authored
Bug 1286
-
Danny Auble authored
Bug 4066 Bug 4135
-
Danny Auble authored
This partially reverts commit a309f77c. It accidentally removed the menu function on mobile devices. Bug 4128
-
Marshall Garey authored
to match the parameter ordering.
-
Marshall Garey authored
Bug 4052
-
- 01 Sep, 2017 2 commits
-
-
Danny Auble authored
checked on submit. This only mattered when submitting a job to multiple partitions. Bug 4066
-
Danny Auble authored
on node 0. Bug 4035
-
- 31 Aug, 2017 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Will be removed in 17.11, and 'sgj_job' is a typo anyways.
-
Tim Wickberg authored
-
- 30 Aug, 2017 2 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
This reverts commit 0581585c. Do not change permissions on files the testsuite does not "own". Bug 4118.
-
- 29 Aug, 2017 1 commit
-
-
Danny Auble authored
Starting in MariaDB 10.2 many of the api commands started setting errno erroneously.
-
- 28 Aug, 2017 2 commits
-
-
Morris Jette authored
Test was sporadically failing on smd without sleep
-
Morris Jette authored
bug 4095
-
- 24 Aug, 2017 1 commit
-
-
Alejandro Sanchez authored
Calling bit_unfmt() with a zero bit_size() bitmap leads to a later call to bit_nclear() with start=0 and stop=-1, leading to the ABRT. This scenario happened when cgroup.conf has ConstrainDevices=yes and task_cgroup_devices_create() tries to collect the GRES devices but gres_cpu_cnt=0, thus creating a p->cpus_bitmap = bit_alloc(gres_cpu_cnt); of zero size which is passed by argument to bit_unfmt(). gres_cpu_cnt is 0 because we have defined a gres.conf like this: Name=gpu Type=tesla File=/tmp/gres/tesla0 CPUs=0,1 Name=gpu Type=tesla File=/tmp/gres/tesla1 CPUs=0,1 Name=gpu Type=kepler File=/tmp/gres/kepler0 CPUs=2,3 Name=gpu Type=kepler File=/tmp/gres/kepler1 CPUs=2,3 but have no GresTypes nor GRES option in the slurm.conf / node config def. Bug 3974
-
- 23 Aug, 2017 1 commit
-
-
Alejandro Sanchez authored
Running slurmctld under valgrind while operating with jobcomp/elasticsearch reported the following bytes definitely lost: ==27403== 658 bytes in 1 blocks are definitely lost in loss record 301 of 342 ==27403== at 0x4C2FD4F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==27403== by 0x2281B3: slurm_xrealloc (xmalloc.c:137) ==27403== by 0x22856A: makespace (xstring.c:114) ==27403== by 0x2285D0: _xstrcat (xstring.c:132) ==27403== by 0x228CE0: _xstrfmtcat (xstring.c:291) ==27403== by 0x83C5BCD: ??? ==27403== by 0x30A913: g_slurm_jobcomp_write (slurm_jobcomp.c:172) ==27403== by 0x18D8FC: job_completion_logger (job_mgr.c:13652) It turns out the generated buffer in slurm_jobcomp_log_record was xstrdup'ed to the corresponding job_node->serialized_job, but the originally generated buffer wasn't freed afterwards. The fix consists in change the transfer so that instead of xstrdup'ing the char * we just assign the pointer and NULL the buffer. The job_node->serialized_job was already xfree'd properly later when the job was indexed. Discovered while working on Bug 4065.
-
- 22 Aug, 2017 7 commits
-
-
Alejandro Sanchez authored
Otherwise the resulting URL may be invalid. Update documentation while here as well. Bug 4065.
-
Tim Shaw authored
Otherwise a race between threads in _check_node_status leads to a crash. Bug 4093.
-
Tim Wickberg authored
Modification of commit c7e6d864. Bug 4095.
-
Philip Kovacs authored
bug 4095
-
Philip Kovacs authored
bug 4095
-
Morris Jette authored
-
Philip Kovacs authored
Bug 4094
-
- 21 Aug, 2017 2 commits
-
-
Morris Jette authored
bug 4056
-
Alejandro Sanchez authored
Given a configuration with TopologyParam including Dragonfly option, if a job requested --switches count, the count timeout specified by either the job request or max_switch_wait SchedulerParameters was not respected. This was due to leaf_switch_count variable not being incremented in _eval_nodes_dfly() function when needed, as we do in _eval_nodes_topo(), the later being a execution path which already succeed to wait for the switch count timeout. Bug 4056
-
- 18 Aug, 2017 1 commit
-
-
Alejandro Sanchez authored
-
- 17 Aug, 2017 2 commits
-
-
Tim Wickberg authored
-
Morris Jette authored
Coverity CID 44649 Bug 4085
-
- 16 Aug, 2017 1 commit
-
-
Danny Auble authored
instead of local. Bug 3546
-