- 14 May, 2019 2 commits
-
-
Danny Auble authored
Continuation of 3beabdb1
-
Morris Jette authored
These test changes are designed to support gres/gpu configurations where only some sockets actually have GPUs. The tests will not work with all possible configurations, but this change will result in the tests working in more cases.
-
- 13 May, 2019 4 commits
-
-
Morris Jette authored
select/cray replaced by select/cray_aries in tests
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Chad Vizino authored
Bug 6902.
-
- 11 May, 2019 3 commits
-
-
Morris Jette authored
If they do not, then explicitly cancel them
-
Morris Jette authored
-
Morris Jette authored
Change to work on CPU-rich / memory-poor nodes.
-
- 10 May, 2019 12 commits
-
-
Nate Rini authored
Bug 6952.
-
Nate Rini authored
Fix leaks of cluster_list and db_jobs_list. Bug 6952.
-
Nate Rini authored
No functional change. Bug 6952.
-
Nate Rini authored
Call _purge_known_jobs() from _get_runaway_jobs() to purge known jobs (to slurmctld) from the list. Removed secondary list runaway_jobs as it was no longer needed. This also avoids leaking all the runaway_jobs. Bug 6952.
-
Alejandro Sanchez authored
-
Marshall Garey authored
Bug 6033.
-
Marshall Garey authored
If _get_oldest_record() finds a record to archive/purge, then archive should always archive at least one record. If for whatever reason it fails to archive any records (_archive_table() returns a 0), then we don't want call continue, but want to return an error. Calling continue to go back to the beginning of the while loop would result in an infinite loop. Bug 6033.
-
Marshall Garey authored
Bug 6033.
-
Marshall Garey authored
Trying to archive too many records at once can result in archive files that are too big to read or even too big to be written. Only archive 50k records at a time, like we only purge 50k records at a time. Bug 6033.
-
Marshall Garey authored
The time period of the archive file currently depends on submit or start time and whether the purge period is in hours, days, or months. Previously, if the archive file name already exists, we would overwrite the old archive file with the assumption that these are duplicate records being archived after an archive load. However, that could result in lost records in a couple of ways: * If there were runaway jobs that were part of an old archive file's time period and are later fixed and then purged, the old file would be overwritten. * If jobs or steps are purged but there are still jobs or steps in that time period that are pending or running, the pending or running jobs and steps won't be purged. When they finish and are purged, the old file would be overwritten. Instead of overwriting the old file, we append a number to the file name to create a new file. This will also be important in an upcoming commit. Bug 6033.
-
Marshall Garey authored
It was set but never read. Bug 6033.
-
Marshall Garey authored
Change a few variables in archiving to use the correct signed or unsigned type to avoid implicit casting. Bug 6033.
-
- 09 May, 2019 6 commits
-
-
Morris Jette authored
Replace "hetjob" with "heterogeneous job" for better clarity.
-
Morris Jette authored
This just adds addition debugging information to an error message. bug 6990
-
Morris Jette authored
Otherwise with CR_CPU and threads defined then slurmd will report the core count as the CPU count and mess up scheduling. bug 6990
-
Marcin Stolarek authored
Bug 6966.
-
Broderick Gardner authored
Bug 6799.
-
Chad Vizino authored
Bug 6854
-
- 08 May, 2019 5 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
These conflict with JOB_MEM_SET/JOB_RESIZED in 19.05. Since 19.05rc1 has shipped - but no 18.08 maintenance releases have shipped with these new flags - it is safer to renumber them here to avoid the merge conflict going into 19.05. Bug 6064.
-
Morris Jette authored
-
Bas Nijholt authored
-
Morris Jette authored
Increase the sleep between job end and checking accounting record from 3 to 5 seconds so the test will work reliably.
-
- 07 May, 2019 5 commits
-
-
Morris Jette authored
test7.20 was always leaving vestigial batch output files and lacked job time limits (which could leave vestigial jobs on test failure). bug 6973
-
Alejandro Sanchez authored
Reported as conflicting thread load operations by valgrind --tool=drd. Bugs 6189 and 4159.
-
Alejandro Sanchez authored
This reverts commit f3d678d4.
-
Alejandro Sanchez authored
Reported as conflicting thread load operations by valgrind --tool=drd. Bugs 6189 and 4159.
-
Alejandro Sanchez authored
Bug 6783 comment 35.
-
- 06 May, 2019 1 commit
-
-
Felip Moll authored
When tres_usage_in_max field is empty it is recorded as '' in the database which leads find_tres_count_in_string() to return an INFINITE64. Seff treats INIFINITE64 as a valid value. This patch fixes this issue. Bug 6817
-
- 03 May, 2019 2 commits
-
-
Nate Rini authored
Bug 6880/6952.
-
Dominik Bartkiewicz authored
Bug 6959.
-