1. 21 May, 2019 2 commits
  2. 17 May, 2019 2 commits
  3. 16 May, 2019 1 commit
    • Marshall Garey's avatar
      Fix archive loading events. · 0d0f9deb
      Marshall Garey authored
      There was a syntax error in the mysql for inserting the event records
      into the event table caused by commit 3d61b6aa. The syntax error was
      a semicolon in the middle of the query, for example:
      
      insert into "voyager_event_table" (time_start, time_end, node_name,
      cluster_nodes, reason, reason_uid, state, tres) values ('1538669453',
      '1539298628', 'v1', '', 'cold-start', '1017', '0',
      '1=8,2=4000,5=8,1001=4,1002=1');, (<... another record>);, ...
      
      Bug 7025.
      0d0f9deb
  4. 13 May, 2019 1 commit
  5. 10 May, 2019 2 commits
    • Marshall Garey's avatar
      Only archive 50k records at a time. · ddd49896
      Marshall Garey authored
      Trying to archive too many records at once can result in archive files
      that are too big to read or even too big to be written. Only archive 50k
      records at a time, like we only purge 50k records at a time.
      
      Bug 6033.
      ddd49896
    • Marshall Garey's avatar
      Handle duplicate archive file names. · 1e234c3d
      Marshall Garey authored
      The time period of the archive file currently depends on submit or start
      time and whether the purge period is in hours, days, or months.
      Previously, if the archive file name already exists, we would overwrite
      the old archive file with the assumption that these are duplicate
      records being archived after an archive load. However, that could result
      in lost records in a couple of ways:
      
        * If there were runaway jobs that were part of an old archive file's
        time period and are later fixed and then purged, the old file would
        be overwritten.
        * If jobs or steps are purged but there are still jobs or steps in
        that time period that are pending or running, the pending or running
        jobs and steps won't be purged. When they finish and are purged, the
        old file would be overwritten.
      
      Instead of overwriting the old file, we append a number to the file name
      to create a new file. This will also be important in an upcoming commit.
      
      Bug 6033.
      1e234c3d
  6. 06 May, 2019 1 commit
    • Felip Moll's avatar
      Fix seff memory display overflow · bab13dfd
      Felip Moll authored
      When tres_usage_in_max field is empty it is recorded as '' in the database
      which leads find_tres_count_in_string() to return an INFINITE64. Seff treats
      INIFINITE64 as a valid value. This patch fixes this issue.
      
      Bug 6817
      bab13dfd
  7. 03 May, 2019 1 commit
  8. 02 May, 2019 2 commits
    • Broderick Gardner's avatar
      Fix resubmit to sibling default on fed requeue · 822fe77e
      Broderick Gardner authored
      On requeue, the origin cluster job record is copied to submit
      to sibling clusters. If the job was originally submitted
      to accept cluster default account, partition, etc, those fields
      are now filled in on the origin. Here we add flags to indicate
      that those fields need to be cleared on resubmission to siblings.
      Bug 6064
      822fe77e
    • Broderick Gardner's avatar
      Fix clearing federation cluster lock on requeue · 47909f8e
      Broderick Gardner authored
      This is a holdover from when the fed job_info list was added.
      The cluster lock has to be cleared from both the job_ptr and
      the job_info.
      Bug 6064
      47909f8e
  9. 30 Apr, 2019 1 commit
  10. 29 Apr, 2019 5 commits
  11. 26 Apr, 2019 3 commits
  12. 24 Apr, 2019 3 commits
  13. 23 Apr, 2019 2 commits
  14. 22 Apr, 2019 1 commit
  15. 18 Apr, 2019 4 commits
  16. 16 Apr, 2019 8 commits
  17. 13 Apr, 2019 1 commit
    • Marshall Garey's avatar
      Don't purge jobs if backfill is running. · 426abc7f
      Marshall Garey authored
      The backfill scheduler keeps a local list of job pointers. Since the
      backfill scheduler yields locks, it's possible for pending jobs to be
      canceled and purged in these yield periods. The backfill scheduler then
      has pointers to now invalid memory, and dereferencing those pointers is
      undefined behavior and may result in a segfault.
      
      This commit prevents purging jobs while the backfill scheduler is
      running.
      
      Bug 6621
      426abc7f