- 30 Aug, 2017 3 commits
-
-
David Gloe authored
Statically linked Cray PMI applications still expect to use some file paths containing the old SLURM_ID_HASH format. Some Cray customers have certification requirements that make recompilation difficult. The attached patch defines a macro to convert the new SLURM_ID_HASH to the old format, and writes the files and symlinks necessary for statically linked Cray PMI applications to work. Bug 4114
-
Danny Auble authored
frequency on the batch step. Bug 4073 Also see Bug 3510
-
Tim Wickberg authored
-
- 29 Aug, 2017 5 commits
-
-
Brian Christiansen authored
Bug 4090
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
relies on the primary to do so. There is a potential race condition if the backup DBD tries to create/check the database at the same time as the primary. This patch removes this race by not allowing the backup to do the check/create. Bug 3827
-
- 24 Aug, 2017 1 commit
-
-
Alejandro Sanchez authored
Calling bit_unfmt() with a zero bit_size() bitmap leads to a later call to bit_nclear() with start=0 and stop=-1, leading to the ABRT. This scenario happened when cgroup.conf has ConstrainDevices=yes and task_cgroup_devices_create() tries to collect the GRES devices but gres_cpu_cnt=0, thus creating a p->cpus_bitmap = bit_alloc(gres_cpu_cnt); of zero size which is passed by argument to bit_unfmt(). gres_cpu_cnt is 0 because we have defined a gres.conf like this: Name=gpu Type=tesla File=/tmp/gres/tesla0 CPUs=0,1 Name=gpu Type=tesla File=/tmp/gres/tesla1 CPUs=0,1 Name=gpu Type=kepler File=/tmp/gres/kepler0 CPUs=2,3 Name=gpu Type=kepler File=/tmp/gres/kepler1 CPUs=2,3 but have no GresTypes nor GRES option in the slurm.conf / node config def. Bug 3974
-
- 23 Aug, 2017 1 commit
-
-
Alejandro Sanchez authored
Running slurmctld under valgrind while operating with jobcomp/elasticsearch reported the following bytes definitely lost: ==27403== 658 bytes in 1 blocks are definitely lost in loss record 301 of 342 ==27403== at 0x4C2FD4F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==27403== by 0x2281B3: slurm_xrealloc (xmalloc.c:137) ==27403== by 0x22856A: makespace (xstring.c:114) ==27403== by 0x2285D0: _xstrcat (xstring.c:132) ==27403== by 0x228CE0: _xstrfmtcat (xstring.c:291) ==27403== by 0x83C5BCD: ??? ==27403== by 0x30A913: g_slurm_jobcomp_write (slurm_jobcomp.c:172) ==27403== by 0x18D8FC: job_completion_logger (job_mgr.c:13652) It turns out the generated buffer in slurm_jobcomp_log_record was xstrdup'ed to the corresponding job_node->serialized_job, but the originally generated buffer wasn't freed afterwards. The fix consists in change the transfer so that instead of xstrdup'ing the char * we just assign the pointer and NULL the buffer. The job_node->serialized_job was already xfree'd properly later when the job was indexed. Discovered while working on Bug 4065.
-
- 22 Aug, 2017 2 commits
-
-
Alejandro Sanchez authored
Otherwise the resulting URL may be invalid. Update documentation while here as well. Bug 4065.
-
Philip Kovacs authored
bug 4095
-
- 21 Aug, 2017 2 commits
-
-
Isaac Hartung authored
Print numbers using exponential format if required to fit in allocated field width. The sacctmgr and sshare commands are impacted. bug 1749
-
Alejandro Sanchez authored
Given a configuration with TopologyParam including Dragonfly option, if a job requested --switches count, the count timeout specified by either the job request or max_switch_wait SchedulerParameters was not respected. This was due to leaf_switch_count variable not being incremented in _eval_nodes_dfly() function when needed, as we do in _eval_nodes_topo(), the later being a execution path which already succeed to wait for the switch count timeout. Bug 4056
-
- 18 Aug, 2017 2 commits
-
-
Brian Christiansen authored
-
Alejandro Sanchez authored
Add the following fields as environment variables: CLUSTER, DEPENDENCY, DERIVEDEC, EXITCODE, GROUPNAME, QOS, RESERVATION, USERNAME. LIMIT env variable value format (which means the TimeLimit of the job) has been modified to D-HH:MM:SS. Bug 3942
-
- 17 Aug, 2017 1 commit
-
-
Morris Jette authored
Coverity CID 44649 Bug 4085
-
- 16 Aug, 2017 1 commit
-
-
Danny Auble authored
instead of local. Bug 3546
-
- 15 Aug, 2017 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
bug 3217
-
Morris Jette authored
-
Morris Jette authored
If srun lacks application specification for some component, the next one specified will be used for earlier components.
-
- 14 Aug, 2017 3 commits
-
-
Morris Jette authored
-
Danny Auble authored
This reverts commit 00a691b9.
-
Morris Jette authored
-
- 12 Aug, 2017 1 commit
-
-
Morris Jette authored
Modify scontrol job hold/release and update to operate with heterogeneous job id specification (e.g. "scontrol hold 123+4").
-
- 11 Aug, 2017 5 commits
-
-
Alejandro Sanchez authored
Fix sview to avoid messages to stderr when modifying a block, partition, or reservation. bug 3217
-
Danny Auble authored
This will allow dell's custom syscfg to work correctly. NOTE: Dell calls flat memory just memory. Bug 4034
-
Morris Jette authored
Doing so would break the current scheduling logic.
-
Danny Auble authored
Bug 4059
-
Dominik Bartkiewicz authored
-
- 10 Aug, 2017 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
- 07 Aug, 2017 2 commits
-
-
Danny Auble authored
-
Dominik Bartkiewicz authored
Bug 4019
-
- 04 Aug, 2017 5 commits
-
-
Morris Jette authored
truncation of core specification and not reserving the specified cores. Fixes Coverity CID 45174 and 45175 Bug 4053
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
the tree. Bug 4050
-
Morris Jette authored
Modify launch/slurm plugin to signal all components of a pack job rather than just the one (modify to use a list of step context records).
-