- 14 Oct, 2014 6 commits
-
-
Danny Auble authored
-
Morris Jette authored
Note that PlugStackConfig defaults to plugstack.conf in the same directory as slurm.conf. The added logic tests if the file actually exists (using stat) and if not found then do not fork/exec slurmstepd to invoke the spank prolog/epilog. This saves about 14msec on startup and 14msec on shutdown if no spank plugins are configured. It also eliminates some possible failures (e.g. if fork() fails, or the slurmstepd processes can not exec()). This logic also caches the PlugStackConfig value and reads it again on reconfigure, but avoid reading the value for each job. bug 982
-
Morris Jette authored
Add "void" argument to a function and rename a local function to have a prefix of "_"
-
Nicolas Joly authored
-
Danny Auble authored
9b00f12c
-
Nicolas Joly authored
Signed-off-by: Danny Auble <da@schedmd.com>
-
- 11 Oct, 2014 3 commits
-
-
Morris Jette authored
if a node is down, then permit setting its state to power down, which causes the SuspendProgram to run and set the node state back to cloud.
-
Morris Jette authored
If a node is powered down, then do not power it up on slurmctld restart.
-
Morris Jette authored
The power up/down request only takes effect after the ResumeTimeout or SuspendTimeout is reached in order to avoid a race condition.
-
- 10 Oct, 2014 12 commits
-
-
Danny Auble authored
-
Brian Christiansen authored
Bug #1143
-
Danny Auble authored
-
Dorian Krause authored
This commit fixes a bug we observed when combining select/linear with gres. If an allocation was requested with a --gres argument an srun execution within that allocation would stall indefinitely: -bash-4.1$ salloc -N 1 --gres=gpfs:100 salloc: Granted job allocation 384049 bash-4.1$ srun -w j3c017 -n 1 hostname srun: Job step creation temporarily disabled, retrying The slurmctld log showed: debug3: StepDesc: user_id=10034 job_id=384049 node_count=1-1 cpu_count=1 debug3: cpu_freq=4294967294 num_tasks=1 relative=65534 task_dist=1 node_list=j3c017 debug3: host=j3l02 port=33608 name=hostname network=(null) exclusive=0 debug3: checkpoint-dir=/home/user checkpoint_int=0 debug3: mem_per_node=62720 resv_port_cnt=65534 immediate=0 no_kill=0 debug3: overcommit=0 time_limit=0 gres=(null) constraints=(null) debug: Configuration for job 384049 complete _pick_step_nodes: some requested nodes j3c017 still have memory used by other steps _slurm_rpc_job_step_create for job 384049: Requested nodes are busy If srun --exclusive would have be used instead everything would work fine. The reason is that in exclusive mode the code properly checks whether memory is a reserved resource in the _pick_step_node() function. This commit modifies the alternate code path to do the same.
-
Morris Jette authored
-
Brian Christiansen authored
-
Danny Auble authored
(i.e ArchiveJobs PurgeJobs). This is only a cosmetic change.
-
Nicolas Joly authored
on slurmdbd startup.
-
Danny Auble authored
-
Danny Auble authored
lots of jobs.
-
Danny Auble authored
-
Danny Auble authored
-
- 09 Oct, 2014 3 commits
-
-
Danny Auble authored
did the ALPS reservation. Bug 1115
-
Morris Jette authored
-
Morris Jette authored
Take more job options into consideration to estimate its node count.
-
- 08 Oct, 2014 13 commits
-
-
Danny Auble authored
-
inodb authored
At work in Sweden we often fika (coffee+buns and what have u) at 3PM. I sometimes accidentally give a start time of 'teatime', so when I return from 'fika' I see my job's just getting started. This fix should make life even easier for the Swedes.
-
Nicolas Joly authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Nicolas Joly authored
-
Morris Jette authored
-
Morris Jette authored
-
- 07 Oct, 2014 3 commits
-
-
Morris Jette authored
This is a minor change to commit 4b8cdd4c from yesterday and is needed to work with launch/poe (which was broken by the commit).
-
Danny Auble authored
a reservation.
-
Danny Auble authored
which they have access to (rather then preventing them from seeing ANY reservation). Backport from 14.11 commit 77c2bd25.
-