- 16 Oct, 2015 1 commit
-
-
Deric Sullivan authored
-
- 15 Oct, 2015 1 commit
-
-
David Bigagli authored
-
- 12 Oct, 2015 1 commit
-
-
David Bigagli authored
-
- 09 Oct, 2015 10 commits
-
-
Morris Jette authored
Move the kvs_comm_set structure from src/api/slurm_pmi.h to src/common/slurm_protocol_defs.h so that we can move the free function into src/common also for cleaner logic. bug 1670
-
Morris Jette authored
bug 1670
-
Nathan Yee authored
Remove the individual message free calls scattered throughout the slurm code and use the slurm_free_msg_data() function instead. bug 1670
-
Nathan Yee authored
Previous slurm_free_msg_data() function only supported a subset of RPC types. bug 1670
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
If a job allocation returns some invalid contents, the pointer to the job structure may be NULL. This change preserves the error message and avoids a segv.
-
David Bigagli authored
-
Morris Jette authored
-
Morris Jette authored
A bad (test) system left a bunch of long running jobs. This should clean them up in a timely fashion.
-
- 08 Oct, 2015 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Brian Christiansen authored
If the backup dbd happened to be doing rollup at the time the primary resumed both the primary and the backup would be doing rollups and causing contention on the database tables. The backup would wait for the rollup handler to finish before giving up control. The fix is to cancel the rollup_handler and let the backup begin to shutdown so that it will close an existing connections and then re-exec itself. The re-exec helps because the rollup handler spawns a thread for each cluster to rollup and just cancelling the rollup handler doesn't cancel the spawned threads from the rollup handler. This cleans up the dbd and locks. The re-exec only happens in the backup if the primary resumed and a rollup was happening. Bug 1988
-
Brian Christiansen authored
Fix case where if the backup slurmdbd has existing connections when it gives up control that the it would be killed. If the backup had existing connections when giving up control, it would try to signal the existing threads by using pthread_kill to send SIGKILL to the threads. The problem is that SIGKILL doesn't go the thread but the main process and the backup dbd would be killed.
-
Danny Auble authored
when a cold-start (-c) happens to the slurmctld.
-
Morris Jette authored
This was intended as a step toward managing jobs across mutliple clusters, but we will be pursuing a very different design.
-
- 07 Oct, 2015 21 commits
-
-
Danny Auble authored
Conflicts: src/sacct/options.c
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
from a user. This would cause the slurmctld to cache the old default which wasn't valid and cause the user to have to request the association always.
-
Danny Auble authored
Conflicts: NEWS src/plugins/accounting_storage/mysql/as_mysql_job.c
-
Morris Jette authored
-
Morris Jette authored
bug 2009
-
Morris Jette authored
-
Morris Jette authored
Each node could have fewer tasks allocated on a node than the plane size, which broke the test. The plane size needs to be treated as a maximum consecutive rank value.
-
Thomas Cadeau authored
-
Morris Jette authored
-
Morris Jette authored
byg 2013
-
David Bigagli authored
-
Hongjia Cao authored
-
David Bigagli authored
-
Hongjia Cao authored
-
David Bigagli authored
-
David Bigagli authored
-
David Bigagli authored
-
Artem Polyakov authored
-
Danny Auble authored
database but the start record hadn't made it yet.
-