Commits · 4f2e28019a08e0b6a7bdb820a317ef5debeb0817 · Manuel G. Marciani / ces_slurm_simulator

16 Oct, 2015 1 commit
- Fix spank man page · 4f2e2801
  Deric Sullivan authored Oct 16, 2015
  
  4f2e2801
15 Oct, 2015 1 commit
- Introduce the requeue_setup_env_fail variable in SchedParams. · 164e6de4
  David Bigagli authored Oct 15, 2015
  
  164e6de4
12 Oct, 2015 1 commit
- Removed support for authd · a269016e
  David Bigagli authored Oct 12, 2015
  
  a269016e
09 Oct, 2015 10 commits
- Move kvs_comm_set struct from api to common · 7a187dce
  Morris Jette authored Oct 09, 2015
```
Move the kvs_comm_set structure from src/api/slurm_pmi.h to
  src/common/slurm_protocol_defs.h so that we can move the
  free function into src/common also for cleaner logic.
bug 1670
```
  7a187dce
- Relocate free functions for PMI/KVS data · cf354d1e
  Morris Jette authored Oct 09, 2015
```
bug 1670
```
  cf354d1e
- Move message free calls · 83185f98
  Nathan Yee authored Oct 09, 2015
```
Remove the individual message free calls scattered throughout the
  slurm code and use the slurm_free_msg_data() function instead.
bug 1670
```
  83185f98
- Add a bunch of message free functions to common slurm_free_msg_data() · 4fcf6f57
  Nathan Yee authored Oct 09, 2015
```
Previous slurm_free_msg_data() function only supported a subset
  of RPC types.
bug 1670
```
  4fcf6f57
- Merge branch 'slurm-15.08' · 2c72bab1
  Morris Jette authored Oct 09, 2015
  
  2c72bab1
- Add link to ib2slurm in topology web page · f217a861
  Morris Jette authored Oct 09, 2015
  
  f217a861
- Avoid srun segv on job allocate failure · 4b0e3c75
  Morris Jette authored Oct 09, 2015
```
If a job allocation returns some invalid contents, the pointer
  to the job structure may be NULL. This change preserves the error
  message and avoids a segv.
```
  4b0e3c75
- Add a little job setup done protocol between slurmd and slurmstepd. · 2d0a027b
  David Bigagli authored Oct 09, 2015
  
  2d0a027b
- Cosmetic changes. No changes to logic · b7054eb0
  Morris Jette authored Oct 08, 2015
  
  b7054eb0
- add time limits to some test suite jobs · df0c47b9
  Morris Jette authored Oct 08, 2015
```
A bad (test) system left a bunch of long running jobs. This should
clean them up in a timely fashion.
```
  df0c47b9
08 Oct, 2015 6 commits

Merge branch 'slurm-15.08' · 6c02e63e
Morris Jette authored Oct 08, 2015

6c02e63e
Rename a test file for better clarity · 7198e637
Morris Jette authored Oct 08, 2015

7198e637

Fix case where the primary and backup dbds would both be performing rollup. · b2eb504b

Brian Christiansen authored Oct 07, 2015

If the backup dbd happened to be doing rollup at the time the primary resumed
both the primary and the backup would be doing rollups and causing contention on
the database tables. The backup would wait for the rollup handler to finish
before giving up control.

The fix is to cancel the rollup_handler and let the backup begin to shutdown so
that it will close an existing connections and then re-exec itself. The re-exec
helps because the rollup handler spawns a thread for each cluster to rollup and
just cancelling the rollup handler doesn't cancel the spawned threads from the
rollup handler. This cleans up the dbd and locks. The re-exec only happens in
the backup if the primary resumed and a rollup was happening.

Bug 1988

b2eb504b

Fix case where if the backup slurmdbd has existing connections when it gives... · 44bb06bc

Brian Christiansen authored Oct 07, 2015

Fix case where if the backup slurmdbd has existing connections when it gives up control that the it would be killed.

If the backup had existing connections when giving up control, it would try to
signal the existing threads by using pthread_kill to send SIGKILL to the
threads. The problem is that SIGKILL doesn't go the thread but the main process
and the backup dbd would be killed.

44bb06bc

Fixed slurmctld not sending cold-start messages correctly to the database · 4ed2f8c6
Danny Auble authored Oct 07, 2015
```
when a cold-start (-c) happens to the slurmctld.
```
4ed2f8c6

Remove SICP job option · 0f6bf406

Morris Jette authored Oct 07, 2015

This was intended as a step toward managing jobs across mutliple
  clusters, but we will be pursuing a very different design.

0f6bf406

07 Oct, 2015 21 commits
- Merge remote-tracking branch 'origin/slurm-14.11' into slurm-15.08 · 2dcc2732
  Danny Auble authored Oct 07, 2015
```
Conflicts:
	src/sacct/options.c
```
  2dcc2732
- Fix sacct -j, (nothing but a comma) to not return all jobs. · d5979ef6
  Danny Auble authored Oct 07, 2015
  
  d5979ef6
- Merge remote-tracking branch 'origin/slurm-15.08' · 6c48f39f
  Danny Auble authored Oct 07, 2015
  
  6c48f39f
- sacctmgr - Don't allow default account associations to be removed · 9f602cba
  Danny Auble authored Oct 07, 2015
```
from a user.

This would cause the slurmctld to cache the old default which wasn't valid
and cause the user to have to request the association always.
```
  9f602cba
- Merge remote-tracking branch 'origin/slurm-14.11' into slurm-15.08 · f5d6b175
  Danny Auble authored Oct 07, 2015
```
Conflicts:
	NEWS
	src/plugins/accounting_storage/mysql/as_mysql_job.c
```
  f5d6b175
- Merge branch 'slurm-15.08' · fc0e12c3
  Morris Jette authored Oct 07, 2015
  
  fc0e12c3
- Document sbatch cpu/mem binding env vars · 8e949f72
  Morris Jette authored Oct 07, 2015
```
bug 2009
```
  8e949f72
- Merge branch 'slurm-15.08' · b9fcd7b3
  Morris Jette authored Oct 07, 2015
  
  b9fcd7b3
- Corret plane distribution test · a254c6a5
  Morris Jette authored Oct 07, 2015
```
Each node could have fewer tasks allocated on a node than the plane
  size, which broke the test. The plane size needs to be treated
  as a maximum consecutive rank value.
```
  a254c6a5
- Update documentation for who can set job priority · 0d3ecfe3
  Thomas Cadeau authored Oct 07, 2015
  
  0d3ecfe3
- Allow admin and operator to set job priority at submission · 2dab024d
  Morris Jette authored Oct 07, 2015
  
  2dab024d
- Do not send burst buffer stage out email unless the job uses burst buffers · 3a63b4e0
  Morris Jette authored Oct 07, 2015
```
byg 2013
```
  3a63b4e0
- Update NEWS. · 075668ae
  David Bigagli authored Oct 07, 2015
  
  075668ae
- Fix slurcmtld allowing root to see job steps using squeues -s. · 1026d698
  Hongjia Cao authored Oct 07, 2015
  
  1026d698
- Update NEWS · 170d17d7
  David Bigagli authored Oct 07, 2015
  
  170d17d7
- Fix srun core dump. · 30a5d677
  Hongjia Cao authored Oct 07, 2015
  
  30a5d677
- Add missing files. · 4612dabb
  David Bigagli authored Oct 07, 2015
  
  4612dabb
- Run autogen for the PMIX plugin. · 49cc4c2d
  David Bigagli authored Oct 07, 2015
  
  49cc4c2d
- Fix compilation error. · bf95ca2f
  David Bigagli authored Oct 07, 2015
  
  bf95ca2f
- Introduce the PMIX plugin for high performance MPI startup. · 3089921a
  Artem Polyakov authored Oct 07, 2015
  
  3089921a
- Fix issue with sacct, printing 0_0 for array's that had finished in the · 75ea13a3
  Danny Auble authored Oct 06, 2015
```
database but the start record hadn't made it yet.
```
  75ea13a3