- 19 Apr, 2013 1 commit
-
-
Danny Auble authored
deny the job instead of holding it.
-
- 17 Apr, 2013 3 commits
-
-
Morris Jette authored
Fix for bug 268
-
Danny Auble authored
to implicitly create full system block.
-
Danny Auble authored
cpu count would be reflected correctly.
-
- 16 Apr, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 12 Apr, 2013 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
plugins. For those doing development to use this follow the model set forth in the acct_gather_energy_ipmi plugin.
-
Morris Jette authored
We're in the process of setting up a few GPU nodes in our cluster, and want to use Gres to control access to them. Currently, we have activated one node with 2 GPUs. The gres.conf file on that node reads ---------------- Name=gpu Count=2 File=/dev/nvidia[0-1] Name=localtmp Count=1800 ---------------- (the localtmp is just counting access to local tmp disk.) Nodes without GPUs have gres.conf files like this: ---------------- Name=gpu Count=0 Name=localtmp Count=90 ---------------- slurm.conf contains the following: GresTypes=gpu,localtmp Nodename=DEFAULT Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=62976 Gres=localtmp:90 State=unknown [...] Nodename=c19-[1-16] NodeHostname=compute-19-[1-16] Weight=15848 CoresPerSocket=4 Gres=localtmp:1800,gpu:2 Feature=rack19,intel,ib Submitting a job with sbatch --gres:1 ... sets the CUDA_VISIBLE_DEVICES for the job. However, the values seem a bit strange: - If we submit one job with --gres:1, CUDA_VISIBLE_DEVICES gets the value 0. - If we submit two jobs with --gres:1 at the same time, CUDA_VISIBLE_DEVICES gets the value 0 for one job, and 1633906540 for the other. - If we submit one job with --gres:2, CUDA_VISIBLE_DEVICES gets the value 0,1633906540
-
- 11 Apr, 2013 3 commits
-
-
Danny Auble authored
APRUN_DEFAULT_MEMORY env var for aprun. This scenario will not display the option when used with --launch-cmd.
-
Danny Auble authored
per cpu.
-
Danny Auble authored
per cpu.
-
- 10 Apr, 2013 3 commits
-
-
Morris Jette authored
If task count specified, but no tasks-per-node, then set the tasks per node in the BASIL reservation request.
-
Danny Auble authored
as the hosts given.
-
Danny Auble authored
-
- 09 Apr, 2013 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
isn't a sub-block job.
-
Danny Auble authored
the XML.
-
Danny Auble authored
-
Morris Jette authored
Fix for bug 258
-
- 06 Apr, 2013 1 commit
-
-
Morris Jette authored
Fix sched/backfill logic to initiate jobs with maximum time limit over the partition limit, but the minimum time limit permits it to start. Related to bug 251
-
- 02 Apr, 2013 9 commits
-
-
Danny Auble authored
never looked at to determine eligibility of backfillable job.
-
Morris Jette authored
-
Morris Jette authored
A fix for this problem will require more study. This one causes xassert when an attempt to start a job results in it not being started by sched/backfill due to the partition time limit.
-
Danny Auble authored
and when reading in state from DB2 we find a block that can't be created. You can now do a clean start to rid the bad block.
-
Danny Auble authored
the slurmctld there were software errors on some nodes.
-
Danny Auble authored
without it still existing there. This is extremely rare.
-
Danny Auble authored
a pending job on it we don't kill the job.
-
Danny Auble authored
while it was free cnodes would go into software error and kill the job.
-
Morris Jette authored
Fix sched/backfill logic to initiate jobs with maximum time limit over the partition limit, but the minimum time limit permits it to start. Related to bug 251
-
- 01 Apr, 2013 1 commit
-
-
Morris Jette authored
Fix for bug 224
-
- 29 Mar, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 27 Mar, 2013 3 commits
-
-
Jason Bacon authored
-
Morris Jette authored
WIthout this patch, when the slurmd cold starts or slurmstepd terminates abnormally, the job script file can be left around. bug 243
-
Morris Jette authored
Previously such a job submitted to a DOWN partition would be queued. bug 187
-
- 26 Mar, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
a reservation when it has the "Ignore_Jobs" flag set. Since jobs could run outside of the reservation in it's nodes without this you could have double time.
-
- 25 Mar, 2013 2 commits
-
-
Morris Jette authored
This is not applicable with launch/aprun
-
Morris Jette authored
-