- 28 Jun, 2019 1 commit
-
-
Dominik Bartkiewicz authored
Flags are stored in a smallint, which can only hold the first 16 bits worth out of 32 bits of flags currently in use. MySQL's overflow rules will treat any value > 0xffff as 0xffff, rather than dropping the higher-order bits (flags), which means the stored value not only loses the higher-order bits but corrupts the lower-order as well. The 19.05 release extends the column to bigint (64 bit). Bug 6969.
-
- 07 Jun, 2019 2 commits
-
-
Morris Jette authored
For heterogeneous jobs, do not count the each component against the QOS or association job limit multiple times. bug 7190
-
Albert Gil authored
Bug 6847
-
- 27 May, 2019 1 commit
-
-
Ross Dickson authored
Bug 6466.
-
- 25 May, 2019 1 commit
-
-
Felip Moll authored
The name variable hasn't been set yet, so this is always NULL. Print the uid/gid instead. While here, treat uid/gid as uint32_t, and use strtoul() rather than atoi() to avoid issues with high-number uid/gid values. Fixes GCC 9 warning. Bug 7101.
-
- 24 May, 2019 2 commits
-
-
Nate Rini authored
Use RETRY_DELAY per to mirror job complete delay but without a max retry count for the time being. Bug 6970.
-
Danny Auble authored
Signed-off-by: Brian Christiansen <brian@schedmd.com>
-
- 23 May, 2019 9 commits
-
-
Brian Christiansen authored
Bug 6964
-
Brian Christiansen authored
The reason was being set after the message was sent to the db. Also clear the draing and reboot states before the message is sent so that the event state will show DOWN. Bug 6964
-
Brian Christiansen authored
Bug 6964
-
Brian Christiansen authored
so that new jobs can't get on the node. Bug 6964
-
Dominik Bartkiewicz authored
for completing job. Bug 6927
-
Dominik Bartkiewicz authored
Bug 6926
-
Alejandro Sanchez authored
Continuation of 89b791bf. Bug 7045.
-
Alejandro Sanchez authored
To indicate that a job is dependent or has an invalid dependency. Not used for now, just added and removed according to its meaning. Bug 7045.
-
Albert Gil authored
Bug 7080
-
- 22 May, 2019 2 commits
-
-
Marshall Garey authored
Job steps that run on cloud nodes and use the alias_list - in other words, SlurmctldParameters=cloud_dns is not in slurm.conf - all talk directly back to the slurmctld. To make that happen, we set the parent tank of each stepd to -1. However, we also set the rank of each stepd to 0. this meant that when each stepd sent a REQUEST_STEP_COMPLETE RPC to the slurmctld, they would tell slurmctld to clean up node 0 in the step allocation. So, multi-node step allocations weren't cleaning up after the steps completed and would cause subsequent job steps to hang. The step allocations would only clean up properly at the end of the job. Ensure that each stepd uses the correct rank so that job steps are properly cleaned up after each step completes. Bug 6467.
-
Alejandro Sanchez authored
They were associated to these two commits: b4d7de48 6871185a Bug 5562.
-
- 21 May, 2019 3 commits
-
-
Dominik Bartkiewicz authored
unlimited could get overwritten with default queue depth preventing the whole queue from being looked at -- especially in a high-throughput envrionment. Bug 6822 Co-authored-by: Morris Jette <jette@schedmd.com>
-
Alejandro Sanchez authored
Node memory overallocation wouldn't be properly detected since we would just be interpreting the available memory as RealMemory - MemSpecLimit, ignoring other job's memory usage. Bug 5562.
-
Alejandro Sanchez authored
This compares a job memory request against each selected node available memory, interpreting the latter for now as RealMemory - MemSpecLimit. Bug 5562.
-
- 17 May, 2019 2 commits
-
-
Tim Wickberg authored
This is select/cons_res, not select/cons_tres.
-
Morris Jette authored
Previous select/cons_res logic would allocate one CPU per task on the node Bug 6981
-
- 16 May, 2019 1 commit
-
-
Marshall Garey authored
There was a syntax error in the mysql for inserting the event records into the event table caused by commit 3d61b6aa. The syntax error was a semicolon in the middle of the query, for example: insert into "voyager_event_table" (time_start, time_end, node_name, cluster_nodes, reason, reason_uid, state, tres) values ('1538669453', '1539298628', 'v1', '', 'cold-start', '1017', '0', '1=8,2=4000,5=8,1001=4,1002=1');, (<... another record>);, ... Bug 7025.
-
- 13 May, 2019 1 commit
-
-
Tim Wickberg authored
-
- 10 May, 2019 2 commits
-
-
Marshall Garey authored
Trying to archive too many records at once can result in archive files that are too big to read or even too big to be written. Only archive 50k records at a time, like we only purge 50k records at a time. Bug 6033.
-
Marshall Garey authored
The time period of the archive file currently depends on submit or start time and whether the purge period is in hours, days, or months. Previously, if the archive file name already exists, we would overwrite the old archive file with the assumption that these are duplicate records being archived after an archive load. However, that could result in lost records in a couple of ways: * If there were runaway jobs that were part of an old archive file's time period and are later fixed and then purged, the old file would be overwritten. * If jobs or steps are purged but there are still jobs or steps in that time period that are pending or running, the pending or running jobs and steps won't be purged. When they finish and are purged, the old file would be overwritten. Instead of overwriting the old file, we append a number to the file name to create a new file. This will also be important in an upcoming commit. Bug 6033.
-
- 06 May, 2019 1 commit
-
-
Felip Moll authored
When tres_usage_in_max field is empty it is recorded as '' in the database which leads find_tres_count_in_string() to return an INFINITE64. Seff treats INIFINITE64 as a valid value. This patch fixes this issue. Bug 6817
-
- 03 May, 2019 1 commit
-
-
Nate Rini authored
Bug 6880/6952.
-
- 02 May, 2019 2 commits
-
-
Broderick Gardner authored
On requeue, the origin cluster job record is copied to submit to sibling clusters. If the job was originally submitted to accept cluster default account, partition, etc, those fields are now filled in on the origin. Here we add flags to indicate that those fields need to be cleared on resubmission to siblings. Bug 6064
-
Broderick Gardner authored
This is a holdover from when the fed job_info list was added. The cluster lock has to be cleared from both the job_ptr and the job_info. Bug 6064
-
- 30 Apr, 2019 1 commit
-
-
Danny Auble authored
Blessed by Tim.
-
- 29 Apr, 2019 5 commits
-
-
Brian Christiansen authored
Bug 6513
-
Nate Rini authored
Bug 6895.
-
Brian Christiansen authored
Bug 6895
-
Brian Christiansen authored
Bug 6895
-
- 26 Apr, 2019 3 commits
-
-
Nate Rini authored
Otherwise, we could send communication packets bigger than max_allowed_packet. Bug 6832. Co-authored-by: Tim Wickberg <tim@schedmd.com>
-
Alejandro Sanchez authored
Regression introduced in 8d643e79. Bug 6832.
-