1. 03 Jun, 2013 5 commits
    • Danny Auble's avatar
      Change defaults for input/output for sh5util · c7317684
      Danny Auble authored
      c7317684
    • David Bigagli's avatar
      Fix typoes in slurmdbd.conf man page. · e3219c0e
      David Bigagli authored
      e3219c0e
    • Nathan Yee's avatar
      Add more tests · 1e676a46
      Nathan Yee authored
      test1.70   Validates that srun standard input and output work with binary files.
      test1.71   Validates that srun exit code matches that of a test program.
      1e676a46
    • Morris Jette's avatar
      Merge branch 'slurm-2.5' · 2ed9363c
      Morris Jette authored
      2ed9363c
    • Hongjia Cao's avatar
      restore max_nodes of desc to NO_VAL when checkpointing job · f82e0fb8
      Hongjia Cao authored
      We're having some trouble getting our slurm jobs to successfully
      restart after a checkpoint.  For this test, I'm using sbatch and a
      simple, single-threaded executable.  Slurm is 2.5.4, blcr is 0.8.5.
      I'm submitting the job using sbatch:
      
      $ sbatch -n 1 -t 12:00:00 bin/bowtie-ex.sh
      
      I am able to create the checkpoint and vacate the node:
      
      $ scontrol checkpoint create 137
      .... time passes ....
      $ scontrol vacate 137
      
      At that point, I see the checkpoint file from blcr in the current
      directory and the checkpoint file from Slurm
      in /var/spool/slurm-llnl/checkpoint.  However, when I attempt to
      restart the job:
      
      $ scontrol checkpoint restart 137
      scontrol_checkpoint error: Node count specification invalid
      
      In slurmctld's log (at level 7) I see:
      
      [2013-05-29T12:41:08-07:00] debug2: Processing RPC: REQUEST_CHECKPOINT(restart) from uid=*****
      [2013-05-29T12:41:08-07:00] debug3: Version string in job_ckpt header is JOB_CKPT_002
      [2013-05-29T12:41:08-07:00] _job_create: max_nodes == 0
      [2013-05-29T12:41:08-07:00] _slurm_rpc_checkpoint restart 137: Node count specification invalid
      f82e0fb8
  2. 31 May, 2013 11 commits
  3. 30 May, 2013 17 commits
  4. 29 May, 2013 5 commits
  5. 28 May, 2013 2 commits
    • Morris Jette's avatar
      Make testing of node names more strict · 71ad3ba0
      Morris Jette authored
      If node_name2bitmap() is called with best_effort=false, then do
      not attempt to match names with NodeHostName.
      
      Without this change, a partition that contains a NodeHostName rather
      that NodeName would be configured with the first one found. On a
      front-end system, this would result in the partition's node_bitmap
      being out of sync with the actual node positions.
      
      To reproduce the problem, configure with --enable-multiple-slurmd
      Then in slurm.conf, define something like this:
      NodeName=foo[1-8] NodeHostName=bar ...
      PartitionName=debug Nodes=bar,foo[1-8] ...
      71ad3ba0
    • Danny Auble's avatar
      BLUEGENE - Fix for static systems · 536be448
      Danny Auble authored
      536be448