- 04 Jun, 2014 10 commits
-
-
Jim Nordby authored
This reverts commit f3a66dd8.
-
Danny Auble authored
This reverts commit 421185f4.
-
Danny Auble authored
plugin, which call corresponding alpscomm functions. This enables job steps to give up some of their network resources on suspend, increasing the amount of resources which can be given to each job.
-
Marlys Kohnke authored
-
Morris Jette authored
Recover a list of running jobs when slurmd restarts. This job list is used to determine when a job suspend can take place. This patch also adds job suspend suspend/retry logic since a job suspended immediately after launch can (briefly) return an error that is it not ready yet.
-
Morris Jette authored
If the job to be suspended can not be found, then proceed with the suspend RPC after 3 seconds, Previous logic would hang indefinitely. The job would not be found if it was launched, then the slurmd restarted. The record keeping for launched jobs after slurmd restarts is fixed in a separate patch.
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
This reverts commit bf83bcf7.
-
Danny Auble authored
This reverts commit 4294c561.
-
- 03 Jun, 2014 13 commits
-
-
David Bigagli authored
requeue, requeuehold and release operations.
-
Danny Auble authored
added is a user process or not.
-
Danny Auble authored
-
Danny Auble authored
cont_id contained in it.
-
Danny Auble authored
need to do.
-
David Bigagli authored
-
Danny Auble authored
-
Morris Jette authored
Do not purge the script and environment files for completed jobs on slurmctld reconfiguration or restart (they might be later requeued). Purge the files only when the job record is purged. bug 834
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
If a job --mem-per-cpu limit exceeds the partition or system limit, then scale the job's memory limit and CPUs per task to satisfy the limit. bug 848
-
David Bigagli authored
not finished yet otherwise if requeued the job may enter an invalid COMPLETING state.
-
- 31 May, 2014 6 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
David Gloe authored
messages, moving some messages to higher debug levels, and consolidating some needlessly verbose messages.
-
David Gloe authored
applications from the minimum (1%) to the more appropriate 25%.
-
David Gloe authored
the Aries network.
-
David Gloe authored
network systems - Resolves a bug involving nested batch steps and aeld.
-
- 30 May, 2014 9 commits
-
-
David Gloe authored
(mostly useful for debugging, it isn't used for much) Sets task_is_app, which tells the kernel that this is an application process (for core specialization)
-
David Gloe authored
limits to 4 concurrent jobs per node for our network resources.
-
David Gloe authored
-
David Gloe authored
-
David Gloe authored
generic resources and removing unused warmswap code.
-
David Gloe authored
@bindir@ would resolve to ${prefix}/bin. This patch fixes it, based on http://www.gnu.org/software/autoconf/manual/ autoconf-2.69/html_node/Installation-Directory-Variables.html It also changes opt_modulefiles_slurm to opt_modulefiles_slurm.in but I couldn't figure out how to get git diff to show that.
-
Morris Jette authored
If a job allocates whole nodes (with --core-spec or --exclusive option) and launches a step within that allocation all in a single command (the srun creates the allocation and step at the same time) and requests a specific CPU count (e.g. "-ntasks=# --cpus-per-task=#") then allocate the job step only the requested CPU count, which may be less than the job's allocation. Bug 843
-
Morris Jette authored
-
Morris Jette authored
If shutdown of the slurmctld daemon is in progress, then stop trying to schedule jobs or process reconfigure requests. These are the only operations that take a significant amount of time and only service to slow down the shutdown process. We want the daemon to stop processing incoming RPCs and save state as soon as possible.
-
- 29 May, 2014 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
-