task/cgroup: Fix for task layout logic when disabled resources.
Specifically add the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag when loading configuration from HWLOC library. Previous logic in task/cgroup did not do this, which was different behaviour from how slurmd gets configuration information. Here's the HWLOC documentation: HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM Detect the whole system, ignore reservations and offline settings. Gather all resources, even if some were disabled by the administrator. For instance, ignore Linux Cpusets and gather all processors and memory nodes, and ignore the fact that some resources may be offline. Without this flag, I was rarely observing a bad core count, which resulted in the logic layout out tasks wrong and generating an error: task/cgroup: task[0] infinite loop broken while trying to provision compute elements using cyclic bug 2502
Please register or sign in to comment