1. 22 Mar, 2013 2 commits
  2. 21 Mar, 2013 3 commits
  3. 20 Mar, 2013 9 commits
  4. 19 Mar, 2013 11 commits
    • Morris Jette's avatar
      Merge branch 'slurm-2.5' · 4322b420
      Morris Jette authored
      Conflicts:
      	src/plugins/sched/backfill/backfill.c
      4322b420
    • Don Lipari's avatar
    • Morris Jette's avatar
    • Hongjia Cao's avatar
      change select() to poll() in waiting for a socket to be readable · 3175cf91
      Hongjia Cao authored
      select()/FD_ISSET() does not work for file descriptor larger than 1023.
      3175cf91
    • Morris Jette's avatar
      Note nature of latest change · 8e038b5c
      Morris Jette authored
      8e038b5c
    • Hongjia Cao's avatar
      fix of idle nodes cannot be allocated · 4ea9850a
      Hongjia Cao authored
      avoid add/remove node resource of job if the node is lost by resize
      
       I found another case that idle node can not be allocated. It can be
      reproduced as follows:
      
      1. run a job with -k option:
      
          [root@mn0 ~]# srun -w cn[18-28] -k sleep 1000
          srun: error: Node failure on cn28
          srun: error: Node failure on cn28
          srun: error: cn28: task 10: Killed
          ^Csrun: interrupt (one more within 1 sec to abort)
          srun: tasks 0-9: running
          srun: task 10: exited abnormally
          ^Csrun: sending Ctrl-C to job 106120.0
          srun: Job step aborted: Waiting up to 2 seconds for job step to
      finish.
      
      2. set a node down and then set it idle:
      
          [root@mn0 ~]# scontrol update nodename=cn28 state=down reason="hjcao
      test"
          [root@mn0 ~]# scontrol update nodename=cn28 state=idle
      
      3. restart slurmctld
      
          [root@mn0 ~]# service slurm restart
          stopping slurmctld:                                        [  OK  ]
          slurmctld is stopped
          starting slurmctld:                                        [  OK  ]
      
      4. cancel the job
      
      then, the node set down will be left unavailable:
      
          [root@mn0 ~]# sinfo -n cn[18-28]
          PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
          work*        up   infinite     11   idle cn[18-28]
      
          [root@mn0 ~]# srun -w cn[18-28] hostname
          srun: job 106122 queued and waiting for resources
      
          [root@mn0 slurm]# grep cn28 slurmctld.log
          [2013-03-18T15:28:02+08:00] debug3: cons_res: _vns: node cn28 in
      exclusive use
          [2013-03-18T15:29:02+08:00] debug3: cons_res: _vns: node cn28 in
      exclusive use
      
      I made an attempt to fix this by the attached patch. Please review it.
      4ea9850a
    • Morris Jette's avatar
      Merge branch 'slurm-2.5' · 6dd90805
      Morris Jette authored
      6dd90805
    • Morris Jette's avatar
      Correction in logic issuing call to account for change in job time limit · 9f5a7a0e
      Morris Jette authored
      I don't believe save_time_limit was redundant.  At least in this case:
      
      if (qos_ptr && (qos_ptr->flags & QOS_FLAG_NO_RESERVE)){
          if (orig_time_limit == NO_VAL)
              orig_time_limit = comp_time_limit;
          job_ptr->time_limit = orig_time_limit;
      [...]
      
      So later, when updating the db,
      
          if (save_time_limit != job_ptr->time_limit)
              jobacct_storage_g_job_start(acct_db_conn,
                              job_ptr);
      will cause the db to be updated, while,
      
              if (orig_time_limit != job_ptr->time_limit)
              jobacct_storage_g_job_start(acct_db_conn,
                              job_ptr);
      
      will not because job_ptr->time_limit now equals orig_time_limit.
      9f5a7a0e
    • Morris Jette's avatar
      Merge branch 'slurm-2.5' · 3f24195a
      Morris Jette authored
      Conflicts:
      	src/db_api/cluster_report_functions.c
      	src/plugins/sched/backfill/backfill.c
      3f24195a
    • Morris Jette's avatar
    • Don Lipari's avatar
      Record updated job time limit if modified by backfill · 46348f91
      Don Lipari authored
      Without this change, if the job's time limit is modified down
      toward --time-min by the backfill scheduler, update the job's
      time limit in the database.
      46348f91
  5. 18 Mar, 2013 1 commit
  6. 14 Mar, 2013 7 commits
  7. 13 Mar, 2013 7 commits