Commits · aa0e912795811e2e761b271ec3411a134e87debf · Manuel G. Marciani / ces_slurm_simulator

25 Sep, 2003 5 commits
- Add descriptive comment. No code changes. · aa0e9127
  Moe Jette authored Sep 25, 2003
  
  aa0e9127
- Reset priority of system held jobs (priority==1) when a non-responding · 3dd6dc15
  Moe Jette authored Sep 25, 2003
```
node begins to respond again.
```
  3dd6dc15
- Don't use jobs with priority == 1 as a basis for computing the · 57050cab
  Moe Jette authored Sep 25, 2003
```
lowest active job priority (for working down as new jobs are
initiated). This bug only applies when slurmctld restarts and
there are jobs in system hold state (priority 1).
```
  57050cab
- Don't log errors to syslog if running in the foreground. · 6df35bf8
  Moe Jette authored Sep 25, 2003
  
  6df35bf8
- Permit DRAINING nodes to transistion into DRAINED state even if not responding. · 0c54578e
  Moe Jette authored Sep 25, 2003
```
This is a problem on PVC, where nodes are regularly DRAINED.
```
  0c54578e
24 Sep, 2003 3 commits
- Explain difference in redirected I/O path evaluation between interactive mode and batch mode · 0d07609c
  jwindley authored Sep 24, 2003
  
  0d07609c
- Fix documentation for srun --chdir · 4ff4c504
  jwindley authored Sep 24, 2003
  
  4ff4c504
- o set "unused" host's state to SRUN_HOST_REPLIED to fix bug where · 498a1b5e
  Mark Grondona authored Sep 23, 2003
```
   srun erroneously expected replies from these hosts (gnats:291)
```
  498a1b5e
23 Sep, 2003 5 commits
- o update MPICH example to sort hosts based on taskid srun would assign · 05b479de
  Mark Grondona authored Sep 23, 2003
  
  05b479de
- Rename some variables for greater clarity. No changes in logic. · 31a79ced
  Moe Jette authored Sep 23, 2003
  
  31a79ced
- Make comment more clear. No changes in logic. · 1f21ad73
  Moe Jette authored Sep 23, 2003
  
  1f21ad73
- Add timers for handling queued agent requests so as to support better · 5ad795fc
  Moe Jette authored Sep 23, 2003
```
scalability. An arbitrary number of requests may be queued and they
are processed one per second until the queue is empty or pending
requests were last attempted recently (configuration parameters set
to 60 seconds as a minimum retry interval).
```
  5ad795fc
- Define (and use) minimum and maximum job id to use for no_allocate jobs. · 673b6079
  Moe Jette authored Sep 23, 2003
```
These jobs are reported by slurmd on node registration. They are logged
but otherwise ignored by slurmctld. Several changes to slurmd logging
messaged to report job id and step id using %u format rather than %d
format (which shows no-allocate job id values as negative numbers).
```
  673b6079
22 Sep, 2003 2 commits
- Update META file for 0.2.17 release. · ec0b2633
  Moe Jette authored Sep 21, 2003
  
  ec0b2633
- Describe how inconsistencies are handled with respec to the --relative · 2836c84f
  Moe Jette authored Sep 21, 2003
```
and --nodes options.
```
  2836c84f
21 Sep, 2003 11 commits
- Print a warning message if the number of nodes remaining based upon the · 8d0bdacc
  Moe Jette authored Sep 21, 2003
```
--relative option is lower than the node count specified. The --relative
option takes precedence.
```
  8d0bdacc
- Record updates for 0.2.17 release. · 939a03c0
  Moe Jette authored Sep 21, 2003
  
  939a03c0
- Clean up error messages. · 38c62432
  Moe Jette authored Sep 21, 2003
  
  38c62432
- Clean up error message format. · 62ff4f15
  Moe Jette authored Sep 21, 2003
  
  62ff4f15
- Distinguish in logs between jobs that reach their job time limit and those · fccce86c
  Moe Jette authored Sep 21, 2003
```
that reach the slurm inactivity time limit.
```
  fccce86c
- Parameterize the maximum time for the backup controller to reliquish · f3bce2df
  Moe Jette authored Sep 21, 2003
```
control (it needs to complete all pending RPCs and save state before
the primary reads state and takes over).
```
  f3bce2df
- Restructure code so state is restored only when taking primary controller · 3c4c1e00
  Moe Jette authored Sep 21, 2003
```
responsibilities (backup was routinely reading at startup).
```
  3c4c1e00
- Minor code restructuring to improve responsiveness of slurmctld backup · ac67be19
  Moe Jette authored Sep 21, 2003
```
server to shutdown request.
```
  ac67be19
- Create _disable_signal function for clarity and add disable of SIGIO, · 1c6fafc5
  Moe Jette authored Sep 21, 2003
```
SIGPWR, and SIGLOST.
```
  1c6fafc5
- Fix bug that had non-responding DRAINING node go into state DRAINED · ce1d362d
  Moe Jette authored Sep 20, 2003
```
and when returned to service went improperly back into state DRAINING
(job counter was inconsistent).
```
  ce1d362d
- A non-responding node in DRAINING state will have its jobs killed and · b5cc2f0d
  Moe Jette authored Sep 20, 2003
```
transition to DRAINED state.
```
  b5cc2f0d
20 Sep, 2003 6 commits
- Don't bother to shutdown backup controller on reconfig request. Its · 987261ff
  Moe Jette authored Sep 20, 2003
```
data will not be used and the process is too slow anyway.
```
  987261ff
- Save all state when the last node associated with a job records its · 78fdfe9c
  Moe Jette authored Sep 20, 2003
```
EPILOG_COMPLETE_MESSAGE. At this time the job is COMPLETED and all
associated nodes available.
```
  78fdfe9c
- o NEWS updates for 0.2.17 · 1ecbd59f
  Mark Grondona authored Sep 19, 2003
  
  1ecbd59f
- Block SIGPIPE throughout slurmctld. · 9b1399cf
  Moe Jette authored Sep 19, 2003
  
  9b1399cf
- o add --relative, -r option to srun to allow running job steps · 74d08e4c
  Mark Grondona authored Sep 19, 2003
```
   on nodes relative to the current allocation.
 o srun no longer sends SIGKILL to job if one task is killed except
   if --no-allocate is used. (the job will otherwise be killed by
   the controller anyway)
```
  74d08e4c
- o reenable chkconfig on install · 99874548
  Mark Grondona authored Sep 19, 2003
  
  99874548
19 Sep, 2003 8 commits
- o update to latest hostlist.[ch] from LSD-Tools. Includes hostlist_nth() · af4213fe
  Mark Grondona authored Sep 19, 2003
```
   function.
```
  af4213fe
- o added code to clear any stale job step entries in shm if shared · 2f89ecd2
  Mark Grondona authored Sep 19, 2003
```
   memory appears to be full.
```
  2f89ecd2
- o add ECONNRESET to the set of errnos that close the IO object on · da5571b9
  Mark Grondona authored Sep 19, 2003
```
   write() call.
```
  da5571b9
- o fixes related to kill_job rpc: · 0dd2bfcd
  Mark Grondona authored Sep 19, 2003
```
  - instead of attempting to kill pending threads, immediately
    exit wait_for_procs if a thread is already waiting for job.
  - if wait_for_procs fails (thread already waiting), exit w/out
    sending epilog complete rpc.
```
  0dd2bfcd
- Change logging level of detailed communication errors from type info to · 019f229e
  Moe Jette authored Sep 19, 2003
```
type debug.
```
  019f229e
- Re-issue _slurm_close functions on EINTR to avoid leaving orphan file · 97bee5a0
  Moe Jette authored Sep 19, 2003
```
descriptors. This was needed in several grouped functions (e.g.
slurm_send_recv_rc_msg and slurm_send_only_node_msg, which combine
open, send, receive, and close functions for simplicity).
```
  97bee5a0
- Add a couple of xassert() calls. · e48ce3ac
  Moe Jette authored Sep 19, 2003
  
  e48ce3ac
- Change return in main to exit(0). · 75205e1a
  Moe Jette authored Sep 19, 2003
  
  75205e1a