- 14 Aug, 2013 7 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
jette authored
-
David Bigagli authored
-
Morris Jette authored
Problem reported by BYU. slurm.conf included a file one byte in length. Logic created a buffer one byte long and used fgets() to read the file. fgets() reads one byte less than the buffer size to include a trailing '\0', so it fails to read the file.
-
Danny Auble authored
Basically the system size has to be set up before you call the priority/multifactor plugin. If a job is finishing while the slurmctld is starting then it would fatal on the init if it wasn't set up.
-
Danny Auble authored
-
- 13 Aug, 2013 12 commits
-
-
Morris Jette authored
-
Morris Jette authored
core reservations and reservation prolog/epilog
-
John Thiltges authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Michael Gutteridge authored
I'm running Slurm 2.6.0 and MWM 7.2.4 in our test cluster at the moment. I happened to notice that node load reporting wasn't consistent- periodically you'd see a "sane" load reported in Moab, but most of the time the reported load was zero despite an accurate CPULoad value reported by "scontrol show node". Finally got to digging into this. It appears that the only time load was being reported properly was in the Moab scheduling cycle directly after slurmctld did a node ping. In subsequent scheduling cycles the load (again, as reported by Moab) was back to zero. The node ping is significant as that is the only time the node is updated- since the wiki2 interface only reports records that change, and the load record isn't changed, it isn't reported in the queries after the node ping. Judging from this behavior, I'm guessing that Moab does not store the load value- every time it queries resources in Slurm it sets the node's load back to zero. I've altered src/plugins/sched/wiki2/get_nodes.c slightly- basically moved the section that reports CPULOAD above the check for updated info (update_time > last_node_update). So I don't know if this is the appropriate way to fix it. The wiki specification that Adaptive has published doesn't seem to indicate how this should function. Either MWM should assume the last value reported is still accurate or Slurm needs to report it for every wiki GETNODES command. Anyway, the patch is attached, it seems to be working for me, and I've rolled it into our debian build directory. YMMV. Michael
-
jette authored
I don't see how this could happen, but it might explain something reported by Harvard University. In any case, this could prevent an infinite loop if the task distribution funciton is passed a job allocation with zero nodes.
-
Morris Jette authored
-
jette authored
This problem was reported by Harvard University and could be reproduced with a command line of "srun -N1 --tasks-per-node=2 -O id". With other job types, the error message could be logged many times for each job. This change logs the error once per job and only if the job request does not include the -O/--overcommit option.
-
Morris Jette authored
-
Danny Auble authored
was down (slurmctld not running) during that time period.
-
- 09 Aug, 2013 2 commits
-
-
Danny Auble authored
version of Slurm.
-
Danny Auble authored
-
- 08 Aug, 2013 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
sacct.
-
Danny Auble authored
-
- 07 Aug, 2013 3 commits
-
-
Morris Jette authored
Remove documentation about "*cpu" portion of gres specification in the man pages for salloc, sbatch, and srun. Support for this specification was never implemented nor does the GRES data structure include a field for it.
-
Morris Jette authored
-
Danny Auble authored
-
- 06 Aug, 2013 7 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
as the job completes.
-
Danny Auble authored
of at multifactor poll.
-
Danny Auble authored
-
Morris Jette authored
Need higher memory limits due to pmdv12 size pmdv12 fails to recognize immediate application exit, hangs with defunct process
-
Morris Jette authored
-
- 05 Aug, 2013 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
- 01 Aug, 2013 2 commits
-
-
Rod Schultz authored
-
Danny Auble authored
-