1. 23 May, 2019 2 commits
  2. 22 May, 2019 5 commits
  3. 21 May, 2019 8 commits
  4. 17 May, 2019 2 commits
  5. 16 May, 2019 5 commits
    • Morris Jette's avatar
      Only allocate 1 CPU per node with the --overcommit option · dd7775ef
      Morris Jette authored
      Previous select/cons_tres logic would allocate one CPU per task on the node
      
      Bug 6981
      dd7775ef
    • Morris Jette's avatar
      modify task layout with --overcommit · 42d7e312
      Morris Jette authored
      Modify task layout with --overcommit option plus a heterogeneous job
          allocation so that a cyclic task distribution can start happening before
          all CPUs on all nodes are fully allocated. The number of tasks per node
          will be unchanged from the previous algorithm, but tasks will be distributed
          in a cyclic fashion first and then extra tasks placed on nodes with more
          CPUs. Previously all CPUs would be fully allocated in a cyclic fashion,
          then excess tasks distributed evenly across all allocated nodes.
      Bug 6981
      42d7e312
    • Dominik Bartkiewicz's avatar
      Store reservation flags in slurmdbd in a uint64_t. · 46d55dd4
      Dominik Bartkiewicz authored
      Add warning to slurm.h.in that no new reservation flags can be
      stored in slurmdbd in 19.05. (Although they could still be used by
      slurmctld without issue.)
      
      Note that the underlying RPC still uses uint32_t, but this will be
      changed before 20.02 on master, and changing the column to uint32_t
      in 19.05 just to change it again in 20.02 is best avoided.
      
      Bug 6969.
      46d55dd4
    • Nathan Rini's avatar
      Fix memory leaks due to incomplete slurmdb_cluster_cond_t destructor. · 2038469f
      Nathan Rini authored
      Free format_list, plugin_id_select_list, rpc_version_list in
       _free_cluster_cond_members().
      
      Bug 7020.
      2038469f
    • Marshall Garey's avatar
      Fix archive loading events. · 0d0f9deb
      Marshall Garey authored
      There was a syntax error in the mysql for inserting the event records
      into the event table caused by commit 3d61b6aa. The syntax error was
      a semicolon in the middle of the query, for example:
      
      insert into "voyager_event_table" (time_start, time_end, node_name,
      cluster_nodes, reason, reason_uid, state, tres) values ('1538669453',
      '1539298628', 'v1', '', 'cold-start', '1017', '0',
      '1=8,2=4000,5=8,1001=4,1002=1');, (<... another record>);, ...
      
      Bug 7025.
      0d0f9deb
  6. 15 May, 2019 1 commit
    • Tim Wickberg's avatar
      Avoid call to slurm_get_slurmd_user_id() in _step_connect() if not slurmd. · 0a4c5234
      Tim Wickberg authored
      For a stray socket, this call would cause nss_slurm to deadlock,
      as any calling path that leads to slurm_conf_lock(), which will call
      getpwuid(), which will re-enter the nss_slurm code, which will end up
      back here but with the slurm_conf_lock already held, at which point
      the process will never continue.
      
      For nss_slurm, this means a node rebooting with stale sockets will hang
      in the middle of the init process, which is a rather unpleasant experience.
      
      So - only handle the stray socket cleanup within the slurmd process itself.
      
      Bug 7030
      0a4c5234
  7. 13 May, 2019 1 commit
  8. 10 May, 2019 3 commits
    • Nate Rini's avatar
      Prevent leak of cluster_str in sacctmgr_list_runaway_jobs(). · bb9d5e79
      Nate Rini authored
      Bug 6952.
      bb9d5e79
    • Marshall Garey's avatar
      Only archive 50k records at a time. · ddd49896
      Marshall Garey authored
      Trying to archive too many records at once can result in archive files
      that are too big to read or even too big to be written. Only archive 50k
      records at a time, like we only purge 50k records at a time.
      
      Bug 6033.
      ddd49896
    • Marshall Garey's avatar
      Handle duplicate archive file names. · 1e234c3d
      Marshall Garey authored
      The time period of the archive file currently depends on submit or start
      time and whether the purge period is in hours, days, or months.
      Previously, if the archive file name already exists, we would overwrite
      the old archive file with the assumption that these are duplicate
      records being archived after an archive load. However, that could result
      in lost records in a couple of ways:
      
        * If there were runaway jobs that were part of an old archive file's
        time period and are later fixed and then purged, the old file would
        be overwritten.
        * If jobs or steps are purged but there are still jobs or steps in
        that time period that are pending or running, the pending or running
        jobs and steps won't be purged. When they finish and are purged, the
        old file would be overwritten.
      
      Instead of overwriting the old file, we append a number to the file name
      to create a new file. This will also be important in an upcoming commit.
      
      Bug 6033.
      1e234c3d
  9. 08 May, 2019 1 commit
  10. 07 May, 2019 3 commits
  11. 06 May, 2019 1 commit
    • Felip Moll's avatar
      Fix seff memory display overflow · bab13dfd
      Felip Moll authored
      When tres_usage_in_max field is empty it is recorded as '' in the database
      which leads find_tres_count_in_string() to return an INFINITE64. Seff treats
      INIFINITE64 as a valid value. This patch fixes this issue.
      
      Bug 6817
      bab13dfd
  12. 03 May, 2019 3 commits
  13. 02 May, 2019 5 commits