Commits · 777bf478837f89b22f0e87b741f03043704f1ca6 · Manuel G. Marciani / ces_slurm_simulator

18 Jan, 2013 1 commit

Replace socket shutdown call with linger sockopt · 777bf478

jette authored Jan 17, 2013

The shutdown call was causing all pending I/O to be discarded.
Linger waits for pending I/O to complete before the close call returns.

777bf478

17 Jan, 2013 6 commits

Merge branch 'slurm-2.5' · d44a7cbd
Morris Jette authored Jan 17, 2013
```
Conflicts:
	src/sacctmgr/sacctmgr.c
	src/sreport/sreport.c
```
d44a7cbd
Terminate sreport on EOF · 892b14aa
David Bigagli authored Jan 17, 2013

892b14aa
Fix typo in comment · 56a821a7
Morris Jette authored Jan 17, 2013

56a821a7
sacctmgr: terminate sacctmgr on EOF if readline function missing · 4277f365
David Bigagli authored Jan 17, 2013

4277f365

Use shutdown() rather than close() for slurmstepd/srun sockets · 30f31198

Morris Jette authored Jan 17, 2013

From Matthieu Hautreux:
However, after discussing the point with onsite Bull support team and looking
at the slurmstepd code concerning stdout/err/in redirection we would like to
recommend two things for future versions of SLURM :

- sutdown(...,SHUT_WR) should be performed when managing the TCP sockets : no
shutdown(...,SHUT_WR) is performed on the TCP socket in slurmstepd eio
management. Thus, the close() can not reliably inform the other end of the
socket that the transmission is done (no TCP_FIN transmitted). As the close is
followed by an exit(), the kernel is the only entity that is knowing of the
fact that the close may not have been took into account by the other side (wich
might be our initial problem) and thus no retry can be performed, letting the
server side of the socket (srun) in a position where it can wait for a read
until the end of time.

- TCP_KEEPALIVE addition. No TCP_KEEPALIVE seems to be configured in SLURM TCP
exchanges, thus letting the system potentially deadlocked if a remote host
dissapear and the local host is waiting on a read (the write would result in a
EPIPE or SIGPIPE depending on the masked signals). Adding keepalive with a
relatively large timeout value (5 minutes), could enhance the resilience of
SLURM for unexpected packet/connection loss without too much implication on the
scalability of the solution. The timeout could be configurable in case it is
find too aggresive for particular configurations.

30f31198

Add support for configurable keep alive time for srun/slurmstep communications · 5752c6ce

Morris Jette authored Jan 16, 2013

Added "KeepAliveTime" configuration parameter

From Matthieu Hautreux:
TCP_KEEPALIVE addition. No TCP_KEEPALIVE seems to be configured in SLURM TCP
exchanges, thus letting the system potentially deadlocked if a remote host
dissapear and the local host is waiting on a read (the write would result in a
EPIPE or SIGPIPE depending on the masked signals). Adding keepalive with a
relatively large timeout value (5 minutes), could enhance the resilience of
SLURM for unexpected packet/connection loss without too much implication on the
scalability of the solution. The timeout could be configurable in case it is
find too aggresive for particular configurations.

5752c6ce

16 Jan, 2013 18 commits
- Merge branch 'slurm-2.5' · 0367b663
  Morris Jette authored Jan 16, 2013
```
Conflicts:
	src/slurmctld/proc_req.c
```
  0367b663
- Clarify use of job_submit/lua plugin · e7a7d483
  Morris Jette authored Jan 16, 2013
  
  e7a7d483
- Terminate sacctmgr on stdin EOF · fe2f22c1
  David Bigagli authored Jan 16, 2013
  
  fe2f22c1
- Improve example of SallocDefaultCommand in slurm.conf man page · f624d8d0
  Morris Jette authored Jan 16, 2013
  
  f624d8d0
- Add links to LBL Node Health Check program in FAQ and Download web pages · dd7fd98a
  Morris Jette authored Jan 16, 2013
  
  dd7fd98a
- Fix for scheduling batch jobs in multiple partitions · 04fbf26a
  Morris Jette authored Jan 16, 2013
```
Without this change a high priority batch job may not start at submit
time. In addtion, a pending job with mutltiple partitions be cancelled
when the scheduler runs if any of it's partitions can not be used by
the job.
```
  04fbf26a
- In sacctmgr, purge local qos list after update · a5310890
  David Bigagli authored Jan 16, 2013
  
  a5310890
- Remove commit 0ca48b70 · 111fd8b2
  Morris Jette authored Jan 16, 2013
```
The original work this was based upon has been replaced with new
logic.
```
  111fd8b2
- Fix for job request with multiple partitions and node features · d84ad203
  Morris Jette authored Jan 16, 2013
```
Without this patch, if the first listed partition lacks nodes
with required features the job would be rejected.
```
  d84ad203
- Revert commit 3d69c635 · 0b9c3c58
  Morris Jette authored Jan 16, 2013
```
While this will validate job at submit time, it results in redundant
looping when scheduling jobs. Working on alternate patch now.
```
  0b9c3c58
- Missed goto for last patch · 0ca48b70
  Danny Auble authored Jan 16, 2013
  
  0ca48b70
- Check all requested partitions to make sure at least one is valid on job · 3d69c635
  Danny Auble authored Jan 16, 2013
```
submission.
```
  3d69c635
- Merge branch 'slurm-2.5' · f2498abf
  Morris Jette authored Jan 15, 2013
  
  f2498abf
- Initialize core_map structure for select/linear, needed for GRES binding · 8880019b
  Morris Jette authored Jan 15, 2013
  
  8880019b
- Merge branch 'slurm-2.5' of https://github.com/SchedMD/slurm into slurm-2.5 · 0db4991d
  Morris Jette authored Jan 15, 2013
  
  0db4991d
- Correction to GRES element selection logic · d5d43fd3
  Morris Jette authored Jan 15, 2013
  
  d5d43fd3
- Correct gres logic to handle difference in core/cpu count · 903b5654
  Morris Jette authored Jan 15, 2013
```
The gres_plugin_job_test was returning a count of cores available
to a job, but the select plugins was treating this as a CPU count.
This change converts the core count into a CPU count as needed in
the select plugin and changes the comments related to the function
gres_plugin_job_test().
```
  903b5654
- Fix issue where sjstat and sjobexitmod was installed in 2 different rpms · 8b91fd63
  Danny Auble authored Jan 15, 2013
  
  8b91fd63
15 Jan, 2013 8 commits
- Fairshare logic: comment the computation of decay_factor · de26f0a3
  Matthieu Hautreux authored Jan 15, 2013
  
  de26f0a3
- Merge branch 'master' of https://github.com/SchedMD/slurm · d11d5b07
  jette authored Jan 15, 2013
  
  d11d5b07
- Remove priority/multifactor2 plugin. · ad3f015b
  jette authored Jan 15, 2013
```
Logic now in priority/multifactor plugin with PriorityFlags=TICKET_BASED.
```
  ad3f015b
- Cosmetic changes discovered while testing priority/multifactor2 · 5c87799b
  Morris Jette authored Jan 15, 2013
  
  5c87799b
- Merge branch 'slurm-2.5' · 7fcc3eb6
  Morris Jette authored Jan 15, 2013
```
Conflicts:
	src/slurmctld/acct_policy.c
```
  7fcc3eb6
- QoS limits enforcement: correct a bug with 0-valued per user used limits · 4136520d
  Matthieu Hautreux authored Jan 15, 2013
```
QoS limits enforcement on the controller side is based on a list of used_limits
per user. When a user is not yet added to the list, which is common when the
controller is restarted and the user has no running jobs, the current logic is
to not check some of the "per user limits" and let the submission succeed.
However, if one of these limits is a zero-valued limit, the check chould
failed as it means that no job should be submitted at all as it would
necessarily result in a crossing of the limit.

This patch ensures that even when a user is not yet present in the per user
used_limits list, the 0-valued limits are correctly treated.
```
  4136520d
- Merge priority/multifactor2 plugin into priority/multifactor · 6596f17e
  David Bigagli authored Jan 14, 2013
```
Add PriorityFlags value of "TICKET_BASED".
```
  6596f17e
- Merge branch 'slurm-2.5' · a68ebb39
  Morris Jette authored Jan 14, 2013
  
  a68ebb39
14 Jan, 2013 7 commits

Handle srun task launch failure without duplicate error messages or abort. · 6e83dc4c
jette authored Jan 14, 2013

6e83dc4c

Prevent srun abort on task launch failure · 163d9547

Hongjia Cao authored Jan 14, 2013

On job step launch failure, the function
"slurm_step_launch_wait_finish()" will be called twice in launch/slurm,
which causes srun to be aborted:

srun: error: Task launch for 22495.0 failed on node cn6: Job credential
expired
srun: error: Application launch failed: Job credential expired
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
cn5
cn4
cn7
srun: error: Timed out waiting for job step to complete
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete
srun: bitstring.c:174: bit_test: Assertion `(b) != ((void *)0)' failed.
Aborted (core dumped)

The attached patch(version 2.5.1) fixes it. But the message of
"
Job step aborted: Waiting up to 2 seconds for job step to finish.
Timed out waiting for job step to complete
"
will still be printed twice.

163d9547

Added new SelectTypeParameter value of "CR_ALLOCATE_FULL_SOCKET". · cdf679d0
Morris Jette authored Jan 14, 2013

cdf679d0
Remove "Related Software" from Slurm web header, just use "Download" · 474ddcce
Morris Jette authored Jan 14, 2013

474ddcce
select/cons_res plugin: CPU allocation logic fix · 1ef41ac9
Morris Jette authored Jan 14, 2013
```
Correction to CPU allocation count logic in for cores without hyperthreading.
```
1ef41ac9

Add SLURM_SRUN_REDUCE_TASK_EXIT_MSG environment variable · 96986199

Hongjia Cao authored Jan 14, 2013

With jobs launched using srun directly which end abnormally, there will
be a step-killed-message(slurmd[cn123]: *** 1234.0 KILLED AT ... WITH
SIGNAL 9 ***) from each node. And/or there will be a
task-exit-message(srun: error: task[0-1]: Terminated) for each node. For
large scale jobs, these messages become tedious and the other error
messages will be buried. The attached two patches(for slurm-2.5.1)
introduce two environment variables to control the output of such
messages:

SLURM_STEP_KILLED_MSG_NODE_ID: if set, only the specified node will
print the step-killed-message;

SLURM_SRUN_REDUCE_TASK_EXIT_MSG: if set and non-zero, successive task
exit messages with the same exit code will be printed only once.

96986199

Add SLURM_STEP_KILLED_MSG_NODE_ID environment variable · 232ab305

Hongjia Cao authored Jan 14, 2013

With jobs launched using srun directly which end abnormally, there will
be a step-killed-message(slurmd[cn123]: *** 1234.0 KILLED AT ... WITH
SIGNAL 9 ***) from each node. And/or there will be a
task-exit-message(srun: error: task[0-1]: Terminated) for each node. For
large scale jobs, these messages become tedious and the other error
messages will be buried. The attached two patches(for slurm-2.5.1)
introduce two environment variables to control the output of such
messages:

SLURM_STEP_KILLED_MSG_NODE_ID: if set, only the specified node will
print the step-killed-message;

SLURM_SRUN_REDUCE_TASK_EXIT_MSG: if set and non-zero, successive task
exit messages with the same exit code will be printed only once.

232ab305