- 17 May, 2019 2 commits
-
-
Tim Wickberg authored
This is select/cons_res, not select/cons_tres.
-
Morris Jette authored
Previous select/cons_res logic would allocate one CPU per task on the node Bug 6981
-
- 16 May, 2019 5 commits
-
-
Morris Jette authored
Previous select/cons_tres logic would allocate one CPU per task on the node Bug 6981
-
Morris Jette authored
Modify task layout with --overcommit option plus a heterogeneous job allocation so that a cyclic task distribution can start happening before all CPUs on all nodes are fully allocated. The number of tasks per node will be unchanged from the previous algorithm, but tasks will be distributed in a cyclic fashion first and then extra tasks placed on nodes with more CPUs. Previously all CPUs would be fully allocated in a cyclic fashion, then excess tasks distributed evenly across all allocated nodes. Bug 6981
-
Dominik Bartkiewicz authored
Add warning to slurm.h.in that no new reservation flags can be stored in slurmdbd in 19.05. (Although they could still be used by slurmctld without issue.) Note that the underlying RPC still uses uint32_t, but this will be changed before 20.02 on master, and changing the column to uint32_t in 19.05 just to change it again in 20.02 is best avoided. Bug 6969.
-
Nathan Rini authored
Free format_list, plugin_id_select_list, rpc_version_list in _free_cluster_cond_members(). Bug 7020.
-
Marshall Garey authored
There was a syntax error in the mysql for inserting the event records into the event table caused by commit 3d61b6aa. The syntax error was a semicolon in the middle of the query, for example: insert into "voyager_event_table" (time_start, time_end, node_name, cluster_nodes, reason, reason_uid, state, tres) values ('1538669453', '1539298628', 'v1', '', 'cold-start', '1017', '0', '1=8,2=4000,5=8,1001=4,1002=1');, (<... another record>);, ... Bug 7025.
-
- 15 May, 2019 1 commit
-
-
Tim Wickberg authored
For a stray socket, this call would cause nss_slurm to deadlock, as any calling path that leads to slurm_conf_lock(), which will call getpwuid(), which will re-enter the nss_slurm code, which will end up back here but with the slurm_conf_lock already held, at which point the process will never continue. For nss_slurm, this means a node rebooting with stale sockets will hang in the middle of the init process, which is a rather unpleasant experience. So - only handle the stray socket cleanup within the slurmd process itself. Bug 7030
-
- 13 May, 2019 1 commit
-
-
Tim Wickberg authored
-
- 10 May, 2019 3 commits
-
-
Nate Rini authored
Bug 6952.
-
Marshall Garey authored
Trying to archive too many records at once can result in archive files that are too big to read or even too big to be written. Only archive 50k records at a time, like we only purge 50k records at a time. Bug 6033.
-
Marshall Garey authored
The time period of the archive file currently depends on submit or start time and whether the purge period is in hours, days, or months. Previously, if the archive file name already exists, we would overwrite the old archive file with the assumption that these are duplicate records being archived after an archive load. However, that could result in lost records in a couple of ways: * If there were runaway jobs that were part of an old archive file's time period and are later fixed and then purged, the old file would be overwritten. * If jobs or steps are purged but there are still jobs or steps in that time period that are pending or running, the pending or running jobs and steps won't be purged. When they finish and are purged, the old file would be overwritten. Instead of overwriting the old file, we append a number to the file name to create a new file. This will also be important in an upcoming commit. Bug 6033.
-
- 08 May, 2019 1 commit
-
-
Bas Nijholt authored
-
- 07 May, 2019 3 commits
-
-
Alejandro Sanchez authored
Reported as conflicting thread load operations by valgrind --tool=drd. Bugs 6189 and 4159.
-
Alejandro Sanchez authored
This reverts commit f3d678d4.
-
Alejandro Sanchez authored
Reported as conflicting thread load operations by valgrind --tool=drd. Bugs 6189 and 4159.
-
- 06 May, 2019 1 commit
-
-
Felip Moll authored
When tres_usage_in_max field is empty it is recorded as '' in the database which leads find_tres_count_in_string() to return an INFINITE64. Seff treats INIFINITE64 as a valid value. This patch fixes this issue. Bug 6817
-
- 03 May, 2019 3 commits
-
-
Nate Rini authored
Bug 6880/6952.
-
Dominik Bartkiewicz authored
Bug 6959.
-
Nate Rini authored
Bug 6944.
-
- 02 May, 2019 6 commits
-
-
Danny Auble authored
-
Broderick Gardner authored
This is the same because xstrdup returns null on null. Bug 6812
-
Danny Auble authored
No real code change.
-
Tim Wickberg authored
It appears this is really what this was suppose to be anyway. Bug 5950
-
Broderick Gardner authored
On requeue, the origin cluster job record is copied to submit to sibling clusters. If the job was originally submitted to accept cluster default account, partition, etc, those fields are now filled in on the origin. Here we add flags to indicate that those fields need to be cleared on resubmission to siblings. Bug 6064
-
Broderick Gardner authored
This is a holdover from when the fed job_info list was added. The cluster lock has to be cleared from both the job_ptr and the job_info. Bug 6064
-
- 01 May, 2019 2 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
- 30 Apr, 2019 4 commits
-
-
Matt Ezell authored
and other_cons_res. continuation of previous commit. Bug 5680
-
Danny Auble authored
Blessed by Tim.
-
Jason Booth authored
Usagefactor matches the documentation and now multiplies TRES time limits and usage. Bug 5435
-
Dineshkumar RAJAGOPAL authored
This is very coarse-grained locking, but as the initial implementation did not anticipate concurrent access this is the safest approach for now. Bug 5638.
-
- 29 Apr, 2019 8 commits
-
-
Tim Wickberg authored
Bug 6632.
-
Brian Christiansen authored
Bug 6513
-
Nate Rini authored
Bug 6895.
-
Brian Christiansen authored
Bug 6895
-
Brian Christiansen authored
Bug 6895
-
Boris Karasev authored
PMIX_VAL_SET will not be supported in PMIx v4 or later. This commit changes the use of the old (and non-standard) PMIX_VAL_SET macro to the standardized PMIX_INFO_LOAD (which is used within a new internal PMIXP_KVP_ADD macro). Bug 6624.
-
Boris Karasev authored
This commit changes the logic of selecting a type of collective. The Tree-based algorithm will be selected when fence with an empty data contribution, which allows for improved fence performance. Bug 6637.
-