- 16 Sep, 2014 5 commits
-
-
David Bigagli authored
and abort the job.
-
Danny Auble authored
a midplane.
-
Danny Auble authored
-
Danny Auble authored
MaxNode limit.
-
Danny Auble authored
only needs to be called once.
-
- 15 Sep, 2014 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
reference.
-
David Bigagli authored
-
- 13 Sep, 2014 1 commit
-
-
Danny Auble authored
s_p_options_t struct.
-
- 12 Sep, 2014 1 commit
-
-
Morris Jette authored
-
- 11 Sep, 2014 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
The CPU specification enforcement is strict rather than advisory.
-
Danny Auble authored
warning.
-
Danny Auble authored
of cpus in the job_resources_t structure so as nodes finish the correct cpu count is displayed in the user tools.
-
- 10 Sep, 2014 4 commits
-
-
Danny Auble authored
-
Nathan Yee authored
-
Morris Jette authored
Previous logic would only make available the CPUs associated with the first N GRES, where N is the number of requested GRES. CPUs which might be made available by using different GRES were not considered available. bug 1092
-
Danny Auble authored
-
- 09 Sep, 2014 4 commits
-
-
Morris Jette authored
Eliminate race condition in enforcement of MaxJobCount limit for job arrays. The job count limit was checked for a job array before setting the slurmctld job locks. If new jobs were submitted between the test and the job array creation such that the job array creation would result in MaxJobCount being exceeded, then a fatal error would result. bug 1091
-
Danny Auble authored
message back. In slow systems with many associations this could speed responsiveness in sacctmgr after adding associations.
-
Morris Jette authored
-
J.T. Conklin authored
-
- 08 Sep, 2014 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
- 05 Sep, 2014 3 commits
-
-
Morris Jette authored
Describe how jobs could be lost on slurmctld crash Backport MPI performance FAQ Fix bad href tag
-
Danny Auble authored
-
Danny Auble authored
-
- 04 Sep, 2014 8 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Pierre Lindenbaum authored
renamed the patch for make 3.81 and added a patch for gnu make 4.0
-
Nathan Yee authored
Previous test could fail for node names numeric suffix with more than one digit. This captures all digits plus checks for names without a numeric suffix.
-
Morris Jette authored
Introduced in commit dc7a1fca
-
David Bigagli authored
Fix error handling for job array create failure due to inability to copy job files (script and environment). See bug 1077
-
- 03 Sep, 2014 5 commits
-
-
David Bigagli authored
are hard links to the first element specification files. If the controller fails to make the links the files are copied instead.
-
Danny Auble authored
reserved for higher priority jobs.
-
Danny Auble authored
-
Andrew Elwell authored
-
Nathan Yee authored
I just ran the test suite for slurm 14.04.7, and have a few suggestions and bugfixes: Test 1.35 fails on our system (probably because we limit memory with cgroups). Changing job_mem_opt from "--mem-per-cpu=64" to "--mem-per-cpu=192" in line 61 fixes the problem for us. Test 1.84 fails to recognise node names like "something1-2", ending up with node names "something1" instead. Changing NodeName=(\w+) to NodeName=([^\s]+) fixes the problem. Test 1.97 reports FAILURE when it discovers that SelectTypeParameters is not CR_PACK_NODES. Having "exit 0" instead of "exit 1" in line 50 is perhaps preferable. Test 2.18 fails because the variable $partition never gets set, so no idle nodes are found in line 215. Setting $partition in globals.local helps, but should not be needed, IMO. There is a function "default_partition" in globals that could perhaps be used. The same applies to test 2.19. Test 12.2 fails on our system because the jobs get killed due to memory limit. Increasing the "slack" in job_mem_limit from 4 to 10 in line 269 fixes the problem for us. Tests 21.30, 21.31 and 21.32 fails when run as a non-privileged user. Perhaps they should test for it and exit with a warning instead, like many other tests. Test 22.1 fails on our system because the time zone is different from where the test was written. The problem is that set midnight 1201766400 is only correct in one time zone (and unfortunately for us, not in our :). Perhaps one could use the GNU date command to get the correct seconds-since-epoch regardless of time zone. Something like date +%s --date=2008-01-31 should do it. Unfortunately, I don't know enough Expect (tcl?) to suggest how to implement that. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
-