1. 01 Oct, 2019 1 commit
    • Felip Moll's avatar
      Increase MAX_ARRAY_LEN_LARGE · f7bed728
      Felip Moll authored
      Increase the maximum array len large to be packed/unpacked with one order of
      magnitude, since the current value demonstrated it is not enough when an MPI
      program spawns a considerable amount of tasks over a big set of nodes.
      
      This limit was introduced in 627928f4.
      
      Bug 7495
      f7bed728
  2. 30 Sep, 2019 4 commits
  3. 26 Sep, 2019 3 commits
  4. 25 Sep, 2019 1 commit
    • Albert Gil's avatar
      Fix scancel --full for proctrack/cgroups · 4dfb3ad6
      Albert Gil authored
      Now the signaling of the batch step and the handeling of the flags is totally
      handled in _kill_all_active_steps() in slurmd, and _handle_signal_container()
      in stepd to ensure that:
      - if KILL_JOB_BATCH then only batch container is signaled
      - if KILL_FULL_JOB then batch script and its children are also signaled
      - if both of the above then only the batch script and its children are signaled
      
      We do not relay anymore on proctrack_g_signal() to handle the batch step
      signaling anymore, therefore it works the same for all proctrack plugins.
      
      This commit also includes minor related fixes in other code handling such
      signaling flags, and documentation improvement.
      
      Bug 7282
      4dfb3ad6
  5. 23 Sep, 2019 1 commit
  6. 20 Sep, 2019 2 commits
  7. 16 Sep, 2019 1 commit
  8. 12 Sep, 2019 3 commits
  9. 10 Sep, 2019 1 commit
  10. 06 Sep, 2019 2 commits
  11. 04 Sep, 2019 3 commits
  12. 03 Sep, 2019 4 commits
  13. 29 Aug, 2019 4 commits
  14. 28 Aug, 2019 1 commit
    • Alejandro Sanchez's avatar
      Don't update [min|max]_exit_code on job array task requeue. · 0e42eb87
      Alejandro Sanchez authored
      Only do so once the task actually finishes. Otherwise, a requeued task
      could set an incorrect max_exit_code even if completed with exit code 0
      after re-running again, leading to problems with i.e. other jobs with an
      afterok type of dependency on such array relying on the incorrectly set
      max_exit_code.
      
      Bug 7552.
      0e42eb87
  15. 23 Aug, 2019 1 commit
  16. 20 Aug, 2019 2 commits
    • Danny Auble's avatar
      Handle situation where a slurmctld tries to communicate with slurmdbd more... · af7b4531
      Danny Auble authored
      Handle situation where a slurmctld tries to communicate with slurmdbd more than once at the same time.
      
      What can happen here is the slurmdbd/slurmctld connection gets hung up
      somehow.  If the slurmctld is restarted a new connection is made along
      side the old connection.  When the old connection gets unwedged the old
      connection will clear out the registration of the slurmctld making it so
      no updates are sent to that slurmctld.
      
      What this does is checks for old connections when a registration message
      comes in.  If we find one we print error set the rem_port = 0 and
      remove it from the list.  This makes it so when it gets unwedged we just
      close the socket instead of remove the registration.
      
      Bug 5213
      af7b4531
    • Alejandro Sanchez's avatar
      Fix NEWS entry for the previous commit a04eea2e. · d0729247
      Alejandro Sanchez authored
      Bug 7360.
      d0729247
  17. 19 Aug, 2019 2 commits
  18. 16 Aug, 2019 1 commit
  19. 15 Aug, 2019 1 commit
  20. 14 Aug, 2019 2 commits