- 07 Dec, 2013 1 commit
-
-
Philip D. Eckert authored
-
- 06 Dec, 2013 5 commits
-
-
Jason Bacon authored
Using CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2392.04-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf27 Family = f Model = 2 Stepping = 7 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> It's also using an older version of hwloc (1.3.1) and I have not yet tested it with a newer one, but since 0 and -1 are legitimate returns values for hwloc_get_nbobjs_by_type(), I think they should be handled in any case. From the hwloc_get_nbobjs_by_type() man page: static inline int hwloc_get_nbobjs_by_type (hwloc_topology_ttopology, hwloc_obj_type_ttype) [static] Returns the width of level type type. If no object for that type exists, 0 is returned. If there are several levels with objects of that type, -1 is returned. I'm attaching a smarter patch that handles both 0 and -1 return values for both CORE and SOCKET. It logs a warning if it has to fudge a 0 return code and bails out with a helpful error message for -1, which I have no idea how to handle. At least people won't have to waste time tracking down the problem this way. Happy Friday, Jason
-
Trofinoff Stephen authored
This adds a mechanism to kill a hung apbasil command
-
Morris Jette authored
error introduced in commit ec4df3bf
-
Jason Bacon authored
-
Morris Jette authored
A abort has been reported if the node's gres count differs from it's bitmap. This has been induced by changing the count manually (e.g. scontrol update nodename=tux123 gres=gpu:4"). I have not been able to reproduce this problem, but this will resize the bitmap in order to avoid the assert failure.
-
- 05 Dec, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
news.html.
-
- 04 Dec, 2013 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
Previous logic never reopened the file, preventing proper functioning of logrotate.
-
- 03 Dec, 2013 4 commits
-
-
Morris Jette authored
Use hash function to locate job records for improved performance.
-
Morris Jette authored
Change partition write lock to a read lock as we use a different mechanism for hidden partitions in getting individual jobs.
-
Morris Jette authored
-
Morris Jette authored
Correct logic returning remaining job dependencies in job information reported by scontrol and squeue. Eliminates vestigial descriptors with no job ID values (e.g. "afterany"). As depdencies are removed, the job ID values were removed from the strings, but not the descriptors. This eliminates both. It also checks the full job ID to make sure we do not remove "afterany:1234" when job "123" completes.
-
- 02 Dec, 2013 3 commits
-
-
Morris Jette authored
Fix race condition on batch job termination that could result in a job exit code of 0xfffffffe if the slurmd on node zero registers its active jobs at the same time that slurmstepd is recording the job's exit code. but 535
-
Morris Jette authored
-
David Bigagli authored
-
- 29 Nov, 2013 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
There was already cgroup locking in the version 14.03 code base using different variable names and slighly different logic from that in commit 3f6d9e36. This commit is a variant of that commit in order to make the logic in version 2.6 match that of our next release (logic which is already pretty well tested). bug 447
-
Morris Jette authored
proctrack/cgroup - Add locking to prevent race condition where one job step is ending for a user or job at the same time another job stepsis starting and the user or job container is deleted from under the starting job step. bug 447
-
Morris Jette authored
This eliminates some now redundant arrays and variable copying introduced in commit 74d1a4b4 bug 525
-
David Bigagli authored
Substantial performance improvement for systems with Shared=YES or FORCE and large numbers of running jobs (replace bubble sort with quick sort). Bug 525
-
David Bigagli authored
Remove trailing spaces No changes in logic
-
- 27 Nov, 2013 5 commits
-
-
Morris Jette authored
Original code worked only for Cray systems. For other systems it set gres_alloc to the total number of each GRES allocated on each node to any job
-
Morris Jette authored
-
Morris Jette authored
-
Jason Bacon authored
-
Morris Jette authored
-
- 26 Nov, 2013 5 commits
-
-
Chris Scheller authored
-
Morris Jette authored
-
Morris Jette authored
Logs errors related to apbasil use
-
Morris Jette authored
No change in logic, just move the logic that resets a batch job accounting information into its own function.
-
Morris Jette authored
-
- 25 Nov, 2013 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
jette authored
No change in underlying logic
-
jette authored
This fixes a problem where a job contains a license that is removed in a slurmctld reconfiguration. Without this change, the job would be left with a non-zero license_list pointer referencing memory that had been freed bug 527
-
jette authored
Increase the range of possible reservation time values to allow for a really long RPC delay (possibly due to slurmctld fail over from primary to backup controller). Also change to a #define value for clarity bug 527
-
- 24 Nov, 2013 2 commits