- 16 May, 2019 3 commits
-
-
Alejandro Sanchez authored
-
Marshall Garey authored
There was a syntax error in the mysql for inserting the event records into the event table caused by commit 3d61b6aa. The syntax error was a semicolon in the middle of the query, for example: insert into "voyager_event_table" (time_start, time_end, node_name, cluster_nodes, reason, reason_uid, state, tres) values ('1538669453', '1539298628', 'v1', '', 'cold-start', '1017', '0', '1=8,2=4000,5=8,1001=4,1002=1');, (<... another record>);, ... Bug 7025.
-
Marshall Garey authored
This commit caused loading usage table archive files to fail. Specifically, wckey and assoc hourly/daily/monthly usage tables and the cluster usage tables archive files would all fail to load. Bug 7025.
-
- 15 May, 2019 5 commits
-
-
Tim Wickberg authored
For a stray socket, this call would cause nss_slurm to deadlock, as any calling path that leads to slurm_conf_lock(), which will call getpwuid(), which will re-enter the nss_slurm code, which will end up back here but with the slurm_conf_lock already held, at which point the process will never continue. For nss_slurm, this means a node rebooting with stale sockets will hang in the middle of the init process, which is a rather unpleasant experience. So - only handle the stray socket cleanup within the slurmd process itself. Bug 7030
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
It's more suitable for the purpose of checking if a file exists, plus avoids the unnecessary struct stat variable since we don't care about the file information. Continuation of 1e234c3d. Bug 6033.
-
Marshall Garey authored
Replace strncpy with xstrdup and snprintf with xstrfmtcat respectively in _make_archive_name. This also fixes a coverity error CID 198462. Continuation of 1e234c3d. Bug 6033.
-
Morris Jette authored
-
- 14 May, 2019 3 commits
-
-
Danny Auble authored
Continuation of 3beabdb1
-
Danny Auble authored
Continuation of 3beabdb1
-
Morris Jette authored
These test changes are designed to support gres/gpu configurations where only some sockets actually have GPUs. The tests will not work with all possible configurations, but this change will result in the tests working in more cases.
-
- 13 May, 2019 4 commits
-
-
Morris Jette authored
select/cray replaced by select/cray_aries in tests
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Chad Vizino authored
Bug 6902.
-
- 11 May, 2019 3 commits
-
-
Morris Jette authored
If they do not, then explicitly cancel them
-
Morris Jette authored
-
Morris Jette authored
Change to work on CPU-rich / memory-poor nodes.
-
- 10 May, 2019 12 commits
-
-
Nate Rini authored
Bug 6952.
-
Nate Rini authored
Fix leaks of cluster_list and db_jobs_list. Bug 6952.
-
Nate Rini authored
No functional change. Bug 6952.
-
Nate Rini authored
Call _purge_known_jobs() from _get_runaway_jobs() to purge known jobs (to slurmctld) from the list. Removed secondary list runaway_jobs as it was no longer needed. This also avoids leaking all the runaway_jobs. Bug 6952.
-
Alejandro Sanchez authored
-
Marshall Garey authored
Bug 6033.
-
Marshall Garey authored
If _get_oldest_record() finds a record to archive/purge, then archive should always archive at least one record. If for whatever reason it fails to archive any records (_archive_table() returns a 0), then we don't want call continue, but want to return an error. Calling continue to go back to the beginning of the while loop would result in an infinite loop. Bug 6033.
-
Marshall Garey authored
Bug 6033.
-
Marshall Garey authored
Trying to archive too many records at once can result in archive files that are too big to read or even too big to be written. Only archive 50k records at a time, like we only purge 50k records at a time. Bug 6033.
-
Marshall Garey authored
The time period of the archive file currently depends on submit or start time and whether the purge period is in hours, days, or months. Previously, if the archive file name already exists, we would overwrite the old archive file with the assumption that these are duplicate records being archived after an archive load. However, that could result in lost records in a couple of ways: * If there were runaway jobs that were part of an old archive file's time period and are later fixed and then purged, the old file would be overwritten. * If jobs or steps are purged but there are still jobs or steps in that time period that are pending or running, the pending or running jobs and steps won't be purged. When they finish and are purged, the old file would be overwritten. Instead of overwriting the old file, we append a number to the file name to create a new file. This will also be important in an upcoming commit. Bug 6033.
-
Marshall Garey authored
It was set but never read. Bug 6033.
-
Marshall Garey authored
Change a few variables in archiving to use the correct signed or unsigned type to avoid implicit casting. Bug 6033.
-
- 09 May, 2019 6 commits
-
-
Morris Jette authored
Replace "hetjob" with "heterogeneous job" for better clarity.
-
Morris Jette authored
This just adds addition debugging information to an error message. bug 6990
-
Morris Jette authored
Otherwise with CR_CPU and threads defined then slurmd will report the core count as the CPU count and mess up scheduling. bug 6990
-
Marcin Stolarek authored
Bug 6966.
-
Broderick Gardner authored
Bug 6799.
-
Chad Vizino authored
Bug 6854
-
- 08 May, 2019 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
These conflict with JOB_MEM_SET/JOB_RESIZED in 19.05. Since 19.05rc1 has shipped - but no 18.08 maintenance releases have shipped with these new flags - it is safer to renumber them here to avoid the merge conflict going into 19.05. Bug 6064.
-
Morris Jette authored
-
Bas Nijholt authored
-