1. 10 May, 2019 2 commits
    • Marshall Garey's avatar
      Only archive 50k records at a time. · ddd49896
      Marshall Garey authored
      Trying to archive too many records at once can result in archive files
      that are too big to read or even too big to be written. Only archive 50k
      records at a time, like we only purge 50k records at a time.
      
      Bug 6033.
      ddd49896
    • Marshall Garey's avatar
      Handle duplicate archive file names. · 1e234c3d
      Marshall Garey authored
      The time period of the archive file currently depends on submit or start
      time and whether the purge period is in hours, days, or months.
      Previously, if the archive file name already exists, we would overwrite
      the old archive file with the assumption that these are duplicate
      records being archived after an archive load. However, that could result
      in lost records in a couple of ways:
      
        * If there were runaway jobs that were part of an old archive file's
        time period and are later fixed and then purged, the old file would
        be overwritten.
        * If jobs or steps are purged but there are still jobs or steps in
        that time period that are pending or running, the pending or running
        jobs and steps won't be purged. When they finish and are purged, the
        old file would be overwritten.
      
      Instead of overwriting the old file, we append a number to the file name
      to create a new file. This will also be important in an upcoming commit.
      
      Bug 6033.
      1e234c3d
  2. 06 May, 2019 1 commit
    • Felip Moll's avatar
      Fix seff memory display overflow · bab13dfd
      Felip Moll authored
      When tres_usage_in_max field is empty it is recorded as '' in the database
      which leads find_tres_count_in_string() to return an INFINITE64. Seff treats
      INIFINITE64 as a valid value. This patch fixes this issue.
      
      Bug 6817
      bab13dfd
  3. 03 May, 2019 1 commit
  4. 02 May, 2019 2 commits
    • Broderick Gardner's avatar
      Fix resubmit to sibling default on fed requeue · 822fe77e
      Broderick Gardner authored
      On requeue, the origin cluster job record is copied to submit
      to sibling clusters. If the job was originally submitted
      to accept cluster default account, partition, etc, those fields
      are now filled in on the origin. Here we add flags to indicate
      that those fields need to be cleared on resubmission to siblings.
      Bug 6064
      822fe77e
    • Broderick Gardner's avatar
      Fix clearing federation cluster lock on requeue · 47909f8e
      Broderick Gardner authored
      This is a holdover from when the fed job_info list was added.
      The cluster lock has to be cleared from both the job_ptr and
      the job_info.
      Bug 6064
      47909f8e
  5. 30 Apr, 2019 1 commit
  6. 29 Apr, 2019 5 commits
  7. 26 Apr, 2019 3 commits
  8. 24 Apr, 2019 3 commits
  9. 23 Apr, 2019 2 commits
  10. 22 Apr, 2019 1 commit
  11. 18 Apr, 2019 4 commits
  12. 16 Apr, 2019 8 commits
  13. 13 Apr, 2019 3 commits
  14. 12 Apr, 2019 1 commit
  15. 10 Apr, 2019 3 commits
    • Albert Gil's avatar
      20c2b615
    • Dominik Bartkiewicz's avatar
    • Alejandro Sanchez's avatar
      burst_buffer/cray - fix script_argv use-after-free. · 81b9d7bd
      Alejandro Sanchez authored
      ==8640== Thread 5 bckfl:
      ==8640== Syscall param openat(filename) points to unaddressable byte(s)
      ==8640==    at 0x4A81D0E: open (open64.c:48)
      ==8640==    by 0x5934ABB: _update_job_env (burst_buffer_cray.c:3338)
      ==8640==    by 0x5934ABB: bb_p_job_begin (burst_buffer_cray.c:3962)
      ...
      ==8640==  Address 0x6b96120 is 16 bytes inside a block of size 61 free'd
      ==8640==    at 0x48369AB: free (vg_replace_malloc.c:530)
      ==8640==    by 0x49D4873: slurm_xfree (xmalloc.c:244)
      ==8640==    by 0x490C317: free_command_argv (run_command.c:249)
      ==8640==    by 0x5934A5C: bb_p_job_begin (burst_buffer_cray.c:3947)
      ...
      ==8640==  Block was alloc'd at
      ==8640==    at 0x4837B65: calloc (vg_replace_malloc.c:752)
      ==8640==    by 0x49D4566: slurm_xmalloc (xmalloc.c:87)
      ==8640==    by 0x49D4B67: makespace (xstring.c:103)
      ==8640==    by 0x49D4C91: _xstrcat (xstring.c:134)
      ==8640==    by 0x49D4ECF: _xstrfmtcat (xstring.c:280)
      ==8640==    by 0x593497C: bb_p_job_begin (burst_buffer_cray.c:3936)
      ...
      
      Bug 6807.
      81b9d7bd