switch/nrt - CAU and RMDA tracking correction
Switch/nrt - Properly track usage of CAU and RDMA resources with multiple tasks per compute node. Previous logic would allocate resources once per task and then deallocate once per node, leaking CMA and RDMA resources and preventing their use by future jobs.
Please register or sign in to comment