1. 20 Feb, 2019 3 commits
    • Tim Wickberg's avatar
      Add SRUN_NET_FORWARD RPC. · 3aa84949
      Tim Wickberg authored
      3aa84949
    • Tim Wickberg's avatar
      Change x11_get_display_port() to x11_get_display(). · a1c30bac
      Tim Wickberg authored
      If DISPLAY is a local UNIX socket, return 0 for the port number,
      and an xmalloc()'d string as the target option.
      a1c30bac
    • Tim Wickberg's avatar
      Rework x11 forwarding RPC fields. · 13cecdad
      Tim Wickberg authored
      Rename x11_target_host to x11_alloc_host to better indicate what the value
      represents. Going forward, the x11_alloc_host and x11_alloc_port are the
      hostname and TCP port number to connect to get the tunnel established.
      
      The x11_target and x11_target_port fields indicate which X11 display to
      connect to. If x11_target_port is zero, this indicates that x11_target
      is a UNIX socket on x11_alloc_host. Otherwise, x11_target is the hostname
      associated with the TCP port in x11_target_port for the DISPLAY.
      
      Make careful changes to older protocol blocks to ensure the 17.11/18.08
      slurmd processes can receive sufficient details from 19.05 slurmctld to
      setup SSH-based forwarding.
      13cecdad
  2. 19 Feb, 2019 3 commits
    • Tim Wickberg's avatar
      Fix whitespace. · 1decd432
      Tim Wickberg authored
      1decd432
    • Morris Jette's avatar
      disable some tests with CR_ONE_TASK_PER_CORE · 4ff57840
      Morris Jette authored
      These tests previously assumed that one task could be launched
      per CPU, which is not necessarily the case
      4ff57840
    • Morris Jette's avatar
      Available CPU count on node with CR_ONE_TASK_PER_CORE · 09f6421d
      Morris Jette authored
      If CR_ONE_TASK_PER_CORE is configured then the core count rather
      than the CPU count of a node is used to determine if a node can
      be used by a job. This can result in a job being rejected than
      should be able to run. Sample configuration and job below:
      SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE
      NodeName=psg-dgx2-01 NodeAddr=jette NodeHostName=jette RealMemory=1536000 Gres=gpu:16 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 State=UNKNOWN
      
      $ srun --gpus-per-task=1 -n1 --cpus-per-gpu=64 -J test39.7 -t1 ./test39.7.input
      srun: error: CPU count per node can not be satisfied
      srun: error: Unable to allocate resources: Requested node configuration is not available
      bug 6517
      09f6421d
  3. 16 Feb, 2019 1 commit
  4. 15 Feb, 2019 3 commits
  5. 14 Feb, 2019 17 commits
  6. 13 Feb, 2019 10 commits
    • Morris Jette's avatar
      Prevent select/cons_tres abort · 788a124c
      Morris Jette authored
      Without this patch, test39.7 would cause _gen_combs() in
      src/plugins/select/cons_tres/dist_tasks.c would abort due to a NULL
      board_combs argument, which was due to ncomb_brd being zero. This
      problem was due to some other inssue in cons_tres currently under
      investigation, but this at least prevents the abort.
      
      Relevent configuration information from slurm.conf:
      SelectType=select/cons_tres
      SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE
      GresTypes=gpu
      NodeName=psg-dgx2-01 NodeAddr=jette NodeHostName=jette RealMemory=1536000 Gres=gpu:16 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 State=UNKNOWN
      PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP
      
      gres.conf (CPUs parameters are recognized as bad here):
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty0 CPUs=0-23,48-71
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty1 CPUs=0-23,48-71
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty2 CPUs=0-23,48-71
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty3 CPUs=0-23,48-71
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty4 CPUs=0-23,48-71
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty5 CPUs=0-23,48-71
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty6 CPUs=0-23,48-71
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty7 CPUs=0-23,48-71
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty8 CPUs=24-47,72-95
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty9 CPUs=24-47,72-95
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty10 CPUs=24-47,72-95
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty11 CPUs=24-47,72-95
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty12 CPUs=24-47,72-95
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty13 CPUs=24-47,72-95
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty14 CPUs=24-47,72-95
      NodeName=psg-dgx2-01 Name=gpu File=/dev/tty15 CPUs=24-47,72-95
      788a124c
    • Morris Jette's avatar
      Cosmetic changes · bd6d70b1
      Morris Jette authored
      Correct format of some comments
      Combine text of log message onto one line so it can be search for
      bd6d70b1
    • Jason Booth's avatar
      Remove deprecated -t option from slurmctld --help · 2ade372b
      Jason Booth authored
      Continuation of 37951110
      
      Bug 6496
      2ade372b
    • Nathan Rini's avatar
    • Michael Hinton's avatar
      Clarify GraceTime in docs · 517bea4f
      Michael Hinton authored
      Bug 6479
      517bea4f
    • Michael Hinton's avatar
      Fix "it's" typo. · 3e98edd4
      Michael Hinton authored
      Bug 6479
      3e98edd4
    • Ben Roberts's avatar
      Fix some typos in documentation · 15af7fd4
      Ben Roberts authored
      Updated accounting.shtml, sched_config.shtml and topology.shtml,
      fixing typos found in those files.
      
      Bug 6482
      15af7fd4
    • Alejandro Sanchez's avatar
    • Felip Moll's avatar
      Fix typo. · adee0b6f
      Felip Moll authored
      adee0b6f
    • Morris Jette's avatar
      Change gres/gpu sorting · 9893b259
      Morris Jette authored
      Previous logic would sort by name using xstrcmp(). The new logic
      extracts the numeric suffix and sorts based upon that number. The
      difference is that the old algorithm would put "/dev/nvidia10" before
      "/dev/nvidia2". The new logic would put "/dev/nvidia10" after
      "/dev/nvidia2" and "/dev/nvidia9".
      9893b259
  7. 12 Feb, 2019 2 commits
  8. 11 Feb, 2019 1 commit