- 03 Sep, 2014 4 commits
-
-
Nathan Yee authored
I just ran the test suite for slurm 14.04.7, and have a few suggestions and bugfixes: Test 1.35 fails on our system (probably because we limit memory with cgroups). Changing job_mem_opt from "--mem-per-cpu=64" to "--mem-per-cpu=192" in line 61 fixes the problem for us. Test 1.84 fails to recognise node names like "something1-2", ending up with node names "something1" instead. Changing NodeName=(\w+) to NodeName=([^\s]+) fixes the problem. Test 1.97 reports FAILURE when it discovers that SelectTypeParameters is not CR_PACK_NODES. Having "exit 0" instead of "exit 1" in line 50 is perhaps preferable. Test 2.18 fails because the variable $partition never gets set, so no idle nodes are found in line 215. Setting $partition in globals.local helps, but should not be needed, IMO. There is a function "default_partition" in globals that could perhaps be used. The same applies to test 2.19. Test 12.2 fails on our system because the jobs get killed due to memory limit. Increasing the "slack" in job_mem_limit from 4 to 10 in line 269 fixes the problem for us. Tests 21.30, 21.31 and 21.32 fails when run as a non-privileged user. Perhaps they should test for it and exit with a warning instead, like many other tests. Test 22.1 fails on our system because the time zone is different from where the test was written. The problem is that set midnight 1201766400 is only correct in one time zone (and unfortunately for us, not in our :). Perhaps one could use the GNU date command to get the correct seconds-since-epoch regardless of time zone. Something like date +%s --date=2008-01-31 should do it. Unfortunately, I don't know enough Expect (tcl?) to suggest how to implement that. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
-
Danny Auble authored
Which in this case is what has happened since the launch failed.
-
Danny Auble authored
since the slurmd will send the same message.
-
Danny Auble authored
correctly.
-
- 30 Aug, 2014 1 commit
-
-
Danny Auble authored
ran inside the allocation can read the environment correctly.
-
- 29 Aug, 2014 1 commit
-
-
Morris Jette authored
Wait up to 20 seconds for gres.conf file before exiting.
-
- 28 Aug, 2014 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
This fixes some problems creating advanced reservations on heterogeneous systems, especially when core counts are specified in the reservation. bug 1068
-
Morris Jette authored
-
Hongjia Cao authored
-
Morris Jette authored
Make "srun --gres=none ..." work when executed without a job allocation (i.e. srun creates the allocation plus the step). Previous logic would try to create the job with a gres value of "none".
-
Morris Jette authored
Fix for possible error if job has GRES, but the step explicitly requests a GRES count of zero.
-
- 27 Aug, 2014 2 commits
-
-
Danny Auble authored
-
Morris Jette authored
BlueGene/Q's runjob command requires a fully qualified pathname, so only for that machine type, resolve the name in srun. This partitially reverts commit 57efc873 but only for BGQ systems.
-
- 26 Aug, 2014 5 commits
-
-
Morris Jette authored
-
Bjørn-Helge Mevik authored
-
Danny Auble authored
-
Danny Auble authored
and only caused confusion since the cpu_bind options mostly refer to a step we opted to only allow srun to set them in future versions.
-
Morris Jette authored
Defer job step initiation of required GRES are in use by other steps rather than immediately returning an error. bug 1056
-
- 25 Aug, 2014 4 commits
-
-
Danny Auble authored
had --network= specified.
-
Danny Auble authored
ProfileHDF5Dir directory as well as all it's sub-directories and files.
-
Danny Auble authored
This isn't the case in the current code so this isn't as big of a deal. This logic will also be removed in 14.11 so it become less of a deal as well, but just to be safe we cover the base.
-
Morris Jette authored
-
- 23 Aug, 2014 2 commits
-
-
Kilian Cavalotti authored
be used for AcctGatherFilesystemType.
-
Kilian Cavalotti authored
-
- 22 Aug, 2014 1 commit
-
-
Danny Auble authored
look at any other day for -D jobs)
-
- 21 Aug, 2014 4 commits
-
-
Morris Jette authored
srun properly interprets a leading "." in the executable name based upon the working directory of the compute node rather than the submit host.
-
Danny Auble authored
script that will use it.
-
Danny Auble authored
-
Danny Auble authored
states.
-
- 20 Aug, 2014 5 commits
-
-
Danny Auble authored
has finished)
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Nathan Yee authored
-
- 19 Aug, 2014 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Arguments in wrong order
-
Danny Auble authored
-