node_features/knl_cray: add UME monitoring
Add logic to monitor Uncorrectable Memory Errors (UME) and notify active jobs in case they run for a while afterwards. This copies logic from knl_generic to knl_cray. There may be a different UME monitoring system for Cray systems in the future. The original knl_generic development is in commit 56ff27da bug 3341
Please register or sign in to comment