- 17 May, 2019 1 commit
-
-
Morris Jette authored
Previous select/cons_res logic would allocate one CPU per task on the node Bug 6981
-
- 16 May, 2019 2 commits
-
-
Marshall Garey authored
There was a syntax error in the mysql for inserting the event records into the event table caused by commit 3d61b6aa. The syntax error was a semicolon in the middle of the query, for example: insert into "voyager_event_table" (time_start, time_end, node_name, cluster_nodes, reason, reason_uid, state, tres) values ('1538669453', '1539298628', 'v1', '', 'cold-start', '1017', '0', '1=8,2=4000,5=8,1001=4,1002=1');, (<... another record>);, ... Bug 7025.
-
Marshall Garey authored
This commit caused loading usage table archive files to fail. Specifically, wckey and assoc hourly/daily/monthly usage tables and the cluster usage tables archive files would all fail to load. Bug 7025.
-
- 15 May, 2019 2 commits
-
-
Alejandro Sanchez authored
It's more suitable for the purpose of checking if a file exists, plus avoids the unnecessary struct stat variable since we don't care about the file information. Continuation of 1e234c3d. Bug 6033.
-
Marshall Garey authored
Replace strncpy with xstrdup and snprintf with xstrfmtcat respectively in _make_archive_name. This also fixes a coverity error CID 198462. Continuation of 1e234c3d. Bug 6033.
-
- 13 May, 2019 1 commit
-
-
Tim Wickberg authored
-
- 10 May, 2019 7 commits
-
-
Marshall Garey authored
Bug 6033.
-
Marshall Garey authored
If _get_oldest_record() finds a record to archive/purge, then archive should always archive at least one record. If for whatever reason it fails to archive any records (_archive_table() returns a 0), then we don't want call continue, but want to return an error. Calling continue to go back to the beginning of the while loop would result in an infinite loop. Bug 6033.
-
Marshall Garey authored
Bug 6033.
-
Marshall Garey authored
Trying to archive too many records at once can result in archive files that are too big to read or even too big to be written. Only archive 50k records at a time, like we only purge 50k records at a time. Bug 6033.
-
Marshall Garey authored
The time period of the archive file currently depends on submit or start time and whether the purge period is in hours, days, or months. Previously, if the archive file name already exists, we would overwrite the old archive file with the assumption that these are duplicate records being archived after an archive load. However, that could result in lost records in a couple of ways: * If there were runaway jobs that were part of an old archive file's time period and are later fixed and then purged, the old file would be overwritten. * If jobs or steps are purged but there are still jobs or steps in that time period that are pending or running, the pending or running jobs and steps won't be purged. When they finish and are purged, the old file would be overwritten. Instead of overwriting the old file, we append a number to the file name to create a new file. This will also be important in an upcoming commit. Bug 6033.
-
Marshall Garey authored
It was set but never read. Bug 6033.
-
Marshall Garey authored
Change a few variables in archiving to use the correct signed or unsigned type to avoid implicit casting. Bug 6033.
-
- 09 May, 2019 1 commit
-
-
Broderick Gardner authored
Bug 6799.
-
- 08 May, 2019 1 commit
-
-
Tim Wickberg authored
These conflict with JOB_MEM_SET/JOB_RESIZED in 19.05. Since 19.05rc1 has shipped - but no 18.08 maintenance releases have shipped with these new flags - it is safer to renumber them here to avoid the merge conflict going into 19.05. Bug 6064.
-
- 06 May, 2019 1 commit
-
-
Felip Moll authored
When tres_usage_in_max field is empty it is recorded as '' in the database which leads find_tres_count_in_string() to return an INFINITE64. Seff treats INIFINITE64 as a valid value. This patch fixes this issue. Bug 6817
-
- 03 May, 2019 1 commit
-
-
Nate Rini authored
Bug 6880/6952.
-
- 02 May, 2019 3 commits
-
-
Broderick Gardner authored
Bug 6064
-
Broderick Gardner authored
On requeue, the origin cluster job record is copied to submit to sibling clusters. If the job was originally submitted to accept cluster default account, partition, etc, those fields are now filled in on the origin. Here we add flags to indicate that those fields need to be cleared on resubmission to siblings. Bug 6064
-
Broderick Gardner authored
This is a holdover from when the fed job_info list was added. The cluster lock has to be cleared from both the job_ptr and the job_info. Bug 6064
-
- 30 Apr, 2019 1 commit
-
-
Danny Auble authored
Blessed by Tim.
-
- 29 Apr, 2019 10 commits
-
-
Brian Christiansen authored
when one offset passes and other fails. Bug 6892
-
Nate Rini authored
Bug 6513.
-
Brian Christiansen authored
Bug 6513
-
Brian Christiansen authored
Bug 6513 First offset is good but second is bad -- didn't request task count. $ cat etc/job_submit.lua function slurm_job_submit(job_desc, part_list, submit_uid) slurm.log_user("submit1\nstuff") slurm.log_user("submit2") slurm.log_user("submit3") -- slurm.log_user("case 0") if job_desc.num_tasks == slurm.NO_VAL or job_desc.num_tasks == nil then slurm.log_user("Batch submit error: Must specify either number of nodes or number of tasks!") -- reject the job return slurm.ERROR end return slurm.SUCCESS end function slurm_job_modify(job_desc, job_rec, part_list, modify_uid) slurm.log_user("modify1") slurm.log_user("modify2") slurm.log_user("modify3") return slurm.SUCCESS end slurm.log_user("initialized") return slurm.SUCCESS $ sbatch -Ablah2 -n1 --wrap="hostname" : -J asdfl sbatch: error: 0: initialized sbatch: error: 0: submit1 sbatch: error: 0: stuff sbatch: error: 0: submit2 sbatch: error: 0: submit3 sbatch: error: submit1 sbatch: error: stuff sbatch: error: submit2 sbatch: error: submit3 sbatch: error: Batch submit error: Must specify either number of nodes or number of tasks! sbatch: error: Batch job submission failed: Unspecified error $ sbatch -Ablah2 -n1 --wrap="hostname" : -J asdfl sbatch: error: 0: initialized sbatch: error: 0: submit1 sbatch: error: 0: stuff sbatch: error: 0: submit2 sbatch: error: 0: submit3 sbatch: error: 1: submit1 sbatch: error: 1: stuff sbatch: error: 1: submit2 sbatch: error: 1: submit3 sbatch: error: 1: Batch submit error: Must specify either number of nodes or number of tasks! sbatch: error: Batch job submission failed: Unspecified error srun already handles this
-
Nate Rini authored
Was dumping this: $ srun -A test7.21-account.1 --qos test7.21-qos.1 -n5 : -n3 : -n1 /bin/true srun: error: 0: submit1 srun: error: submit2 srun: error: submit3 srun: error: Unable to allocate resources: Invalid account or account/partition combination specified Will now dump this: $ srun -A test7.21-account.1 --qos test7.21-qos.1 -n5 : -n3 : -n1 /bin/true srun: error: 0: initialized srun: error: 0: submit1 srun: error: 0: submit2 srun: error: 0: submit3 srun: error: Unable to allocate resources: Invalid account or account/partition combination specified Bug 6513.
-
Nate Rini authored
Bug 6895.
-
Brian Christiansen authored
Bug 6895
-
Brian Christiansen authored
Bug 6895
-
Danny Auble authored
-
- 26 Apr, 2019 9 commits
-
-
Marshall Garey authored
Bug 6215
-
Marshall Garey authored
Change references to the "micro" release in rpc.html and troubleshoot.html as well; SchedMD refers to the last part of the version number as the "maintenance" release. Bug 6833.
-
Alejandro Sanchez authored
Bug 6832.
-
Nate Rini authored
Bug 6832.
-
Nate Rini authored
Bug 6832.
-
Nate Rini authored
No functional change. Bug 6832.
-
Nate Rini authored
Otherwise, we could send communication packets bigger than max_allowed_packet. Bug 6832. Co-authored-by: Tim Wickberg <tim@schedmd.com>
-
Alejandro Sanchez authored
Regression introduced in 8d643e79. Bug 6832.
-