1. 11 Jan, 2018 2 commits
    • Morris Jette's avatar
      node_feature/knl_cray - Fix memory leak · deaacad2
      Morris Jette authored
      node_feature/knl_cray - Fix memory leak that can occur during normal
          operation. This will happen when an update request for a specific
          node happens.
      deaacad2
    • Morris Jette's avatar
      node_feature/knl_cray - Fix memory leaks · 32c93fce
      Morris Jette authored
      If CnselectPath and/or SyscfgPath defined in knl_cray.conf file and
        slurmctld reconfigured, the original values of those paraemters
        would be over-written and their memory leaked.
      32c93fce
  2. 05 Jan, 2018 1 commit
    • Felip Moll's avatar
      Avoid node hang in COMPLETING state · c43df3a7
      Felip Moll authored
      Avoid setting node in COMPLETING state indefinitely if the job initiating
      the node reboot is cancelled while the reboot in in progress. Bug
      introduced in commit 7d246784
      
      Bug 4536
      c43df3a7
  3. 03 Jan, 2018 3 commits
  4. 28 Dec, 2017 1 commit
  5. 15 Dec, 2017 1 commit
    • Morris Jette's avatar
      Add --bb support for "access_mode" · 43bd77e4
      Morris Jette authored
      This adds support for the --bb option "access_mode" in addition to
        "access" for better compatability with Cray's DataWarp options.
      Related to bug 4528
      43bd77e4
  6. 08 Dec, 2017 1 commit
    • Danny Auble's avatar
      Fix Slurm to work correctly with HDF5 1.10+. · 006d172a
      Danny Auble authored
      In 1.10+ they changed the hid_t from an int to a long int which
      messes things up as they use the top 32 bits for stuff right off
      the bat.  This fixes the scenario by handing the number with a int32_t
      instead of an int.
      
      Bug 3795
      006d172a
  7. 05 Dec, 2017 1 commit
    • Alejandro Sanchez's avatar
      Fix to properly remove extern steps from the starting_steps list. · 99b3796b
      Alejandro Sanchez authored
      Since NO_VAL = SLURM_BATCH_SCRIPT, the else statement would only compare
      the job_id and not the step_id, thus when a batch step was removed all
      the steps from that job would be removed too. Then when attempting to
      remove the extern step in the next iteration, it was already removed
      and we were incorrectly erroring out.
      
      Bug 4458.
      99b3796b
  8. 30 Nov, 2017 1 commit
    • Alejandro Sanchez's avatar
      Fix memory leak · 0bb71ce2
      Alejandro Sanchez authored
      Fix memory leak of MailDomain configuration string when slurmctld daemon
         is reconfigured.
      bug 4272 (comment 35)
      0bb71ce2
  9. 29 Nov, 2017 1 commit
    • Brian Christiansen's avatar
      Fix sbatch --wait to stop after job is gone · f9977ee5
      Brian Christiansen authored
      slurm_load_job() prior to 17.11 returns the error code in errno and not
      in rc.  With the addition of 47175901 if a job is removed from memory
      before sbatch checks for the job again, sbatch could get in an loop
      checking for the job. This only happens if you have a very small
      MinJobAge (<10) -- which is not recommended.
      f9977ee5
  10. 28 Nov, 2017 3 commits
  11. 10 Nov, 2017 1 commit
  12. 07 Nov, 2017 2 commits
    • Alejandro Sanchez's avatar
      Fix issue when resetting the partition pointers on nodes. · e6b2bd2d
      Alejandro Sanchez authored
      Issue could be triggered when updating a partition node(s) with node(s)
      that were already in the partition, incorrectly increasing the
      node_record->part_cnt (number of associated partitions) and thus
      incorrectly extending the array of pointers to partitions associated
      with this node, leading to an array with repeated associated
      partitions pointers.
      
      Bug 4318.
      e6b2bd2d
    • Brian Gilmer's avatar
      Cray module file - remove munge support. · 8b71b9fc
      Brian Gilmer authored
      On CLE 6.0 mungedir is /usr; a 'module unload' call then removes /usr/bin
      from PATH which is rather inconvenient.
      
      Bug 4334.
      8b71b9fc
  13. 03 Nov, 2017 1 commit
    • Isaac Hartung's avatar
      Fix updating of requested TRES memory · 994e4f5c
      Isaac Hartung authored
      Memory TRES was getting the pn_min_memroy value when updating the job.
      But the TRES memory value is the total memory of the job
      (e.g pn_min_memory * cpus || pn_min_memory * nodes).
      
      Bug 4177
      994e4f5c
  14. 01 Nov, 2017 3 commits
  15. 30 Oct, 2017 2 commits
  16. 28 Oct, 2017 1 commit
    • Morris Jette's avatar
      CRAY - Fix abort · b319d5b1
      Morris Jette authored
      If configured with NodeFeatures=knl_cray and there are non-KNL
      nodes which include no features the slurmctld will abort without
      this patch when attempting strtok_r(NULL).
      
      bug 4294
      b319d5b1
  17. 27 Oct, 2017 1 commit
  18. 25 Oct, 2017 2 commits
    • Danny Auble's avatar
      Fix layouts code to only allow setting a boolean. · b9273782
      Danny Auble authored
      Before it would allow all sorts of things like
      adding/subtracting/multiplying/etc.  It would cause warnings such as
      
      /home/bart/slurm-tmp5/src/common/layouts_mgr.c: In function ‘_layouts_load_automerge’:
      /home/bart/slurm-tmp5/src/common/layouts_mgr.c:363:21: error: ‘*’ in boolean context, suggest ‘&&’ instead [-Werror=int-in-bool-context]
         *lvalue = *lvalue * *rvalue;     \
                   ~~~~~~~~^~~~
      /home/bart/slurm-tmp5/src/common/layouts_mgr.c:1034:4: note: in expansion of macro ‘_entity_update_kv_helper’
          _entity_update_kv_helper(type_t, operator); \
          ^~~~~~~~~~~~~~~~~~~~~~~~
      /home/bart/slurm-tmp5/src/common/layouts_mgr.c:1086:4: note: in expansion of macro ‘_layouts_load_merge’
          _layouts_load_merge(bool, s_p_get_boolean);
      [tag] [reply] [−] Private Comment 16
      
      Bug 4062
      b9273782
    • Felip Moll's avatar
      Work around issue with sysmacros.h and gcc7 / glibc 2.25. · 8706f388
      Felip Moll authored
      Setting -Werror tricks the test into failing on the
      
      "error: In the GNU C Library, "major" is defined by <sys/sysmacros.h>.
       For historical compatibility, it is  currently defined by <sys/types.h>
       as well, but we plan to remove this soon. To use "major", include
       <sys/sysmacros.h> directly. If you did not intend to use a system-defined
       macro "major", you should undefine it after including <sys/types.h>."
      
      error. Since the normal Slurm build uses -Werror, this warning on including
      both headers will then cause the build itself to fail.
      
      Bug 3982.
      8706f388
  19. 24 Oct, 2017 6 commits
  20. 19 Oct, 2017 5 commits
  21. 18 Oct, 2017 1 commit