1. 15 May, 2019 1 commit
    • Tim Wickberg's avatar
      Avoid call to slurm_get_slurmd_user_id() in _step_connect() if not slurmd. · 0a4c5234
      Tim Wickberg authored
      For a stray socket, this call would cause nss_slurm to deadlock,
      as any calling path that leads to slurm_conf_lock(), which will call
      getpwuid(), which will re-enter the nss_slurm code, which will end up
      back here but with the slurm_conf_lock already held, at which point
      the process will never continue.
      
      For nss_slurm, this means a node rebooting with stale sockets will hang
      in the middle of the init process, which is a rather unpleasant experience.
      
      So - only handle the stray socket cleanup within the slurmd process itself.
      
      Bug 7030
      0a4c5234
  2. 13 May, 2019 1 commit
  3. 10 May, 2019 3 commits
    • Nate Rini's avatar
      Prevent leak of cluster_str in sacctmgr_list_runaway_jobs(). · bb9d5e79
      Nate Rini authored
      Bug 6952.
      bb9d5e79
    • Marshall Garey's avatar
      Only archive 50k records at a time. · ddd49896
      Marshall Garey authored
      Trying to archive too many records at once can result in archive files
      that are too big to read or even too big to be written. Only archive 50k
      records at a time, like we only purge 50k records at a time.
      
      Bug 6033.
      ddd49896
    • Marshall Garey's avatar
      Handle duplicate archive file names. · 1e234c3d
      Marshall Garey authored
      The time period of the archive file currently depends on submit or start
      time and whether the purge period is in hours, days, or months.
      Previously, if the archive file name already exists, we would overwrite
      the old archive file with the assumption that these are duplicate
      records being archived after an archive load. However, that could result
      in lost records in a couple of ways:
      
        * If there were runaway jobs that were part of an old archive file's
        time period and are later fixed and then purged, the old file would
        be overwritten.
        * If jobs or steps are purged but there are still jobs or steps in
        that time period that are pending or running, the pending or running
        jobs and steps won't be purged. When they finish and are purged, the
        old file would be overwritten.
      
      Instead of overwriting the old file, we append a number to the file name
      to create a new file. This will also be important in an upcoming commit.
      
      Bug 6033.
      1e234c3d
  4. 08 May, 2019 1 commit
  5. 07 May, 2019 3 commits
  6. 06 May, 2019 1 commit
    • Felip Moll's avatar
      Fix seff memory display overflow · bab13dfd
      Felip Moll authored
      When tres_usage_in_max field is empty it is recorded as '' in the database
      which leads find_tres_count_in_string() to return an INFINITE64. Seff treats
      INIFINITE64 as a valid value. This patch fixes this issue.
      
      Bug 6817
      bab13dfd
  7. 03 May, 2019 3 commits
  8. 02 May, 2019 6 commits
  9. 01 May, 2019 2 commits
  10. 30 Apr, 2019 4 commits
  11. 29 Apr, 2019 11 commits
  12. 26 Apr, 2019 3 commits
  13. 25 Apr, 2019 1 commit