1. 08 Jan, 2013 1 commit
    • Morris Jette's avatar
      Added support for job arrays. · 2993b423
      Morris Jette authored
      Phase 1 of effort. See "man sbatch" option -a/--array option for details.
      Creates job records using sbatch. Reports job arrays using scontrol or
      squeue. More work coming soon...
      2993b423
  2. 03 Jan, 2013 5 commits
  3. 28 Dec, 2012 1 commit
  4. 22 Dec, 2012 1 commit
  5. 21 Dec, 2012 3 commits
    • Morris Jette's avatar
      Correct job time limit for sched/backfil and job has QOS with NO_RESERVE flag · 4652e982
      Morris Jette authored
      If sched/backfill starts a job with a QOS having NO_RESERVE and not job
      time limit, start it with the partition time limit (or one year if the
      partition has no time limit) rather than NO_VAL (140 year time limit);
      
      If a standby job, which in this
      case has the NO_RESERVE flag set, is submitted
      without a time limit, and is backfilled, it
      will get an EndTime waaayyyy into the future.
      
      JobId=99 Name=cmdll
         UserId=eckert(1043) GroupId=eckert(1043)
         Priority=12083 Account=sa QOS=standby
         JobState=RUNNING Reason=None Dependency=(null)
         Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
         RunTime=00:00:14 TimeLimit=12:00:00 TimeMin=N/A
         SubmitTime=2012-12-20T11:49:36 EligibleTime=2012-12-20T11:49:36
         StartTime=2012-12-20T11:49:44 EndTime=2149-01-26T18:16:00
      
      so I looked at the code in /src/plugins/sched/backfill:
      
                      if (job_ptr->start_time <= now) {
                              int rc = _start_job(job_ptr, resv_bitmap);
                              if (qos_ptr && (qos_ptr->flags & QOS_FLAG_NO_RESERVE)){
                                      job_ptr->time_limit = orig_time_limit;
                                      job_ptr->end_time = job_ptr->start_time +
                                                          (orig_time_limit * 60);
      
      Using the debugger I found that if the job does not have a specified
      time limit, the job_ptr->time_limit is equal to NO_VAL when it hits
      this code.
      4652e982
    • Danny Auble's avatar
    • Morris Jette's avatar
      Added "HealthCheckNodeState" configuration parameter · b139f654
      Morris Jette authored
      Identify node states on which HealthCheckProgram should be executed.
      b139f654
  6. 20 Dec, 2012 4 commits
  7. 19 Dec, 2012 5 commits
  8. 18 Dec, 2012 1 commit
  9. 17 Dec, 2012 4 commits
  10. 14 Dec, 2012 4 commits
  11. 13 Dec, 2012 3 commits
  12. 12 Dec, 2012 1 commit
  13. 07 Dec, 2012 1 commit
    • Morris Jette's avatar
      Correction to hostlist sorting · c8f97453
      Morris Jette authored
      Correction to hostlist sorting for hostnames that contain two numeric
      components and the first numeric component has various sizes (e.g.
      "rack9blade1" should come before "rack10blade1")
      c8f97453
  14. 06 Dec, 2012 1 commit
  15. 05 Dec, 2012 3 commits
  16. 04 Dec, 2012 1 commit
  17. 30 Nov, 2012 1 commit