Commit 86f60616 authored by Martin Perry's avatar Martin Perry Committed by Morris Jette
Browse files

Energy use collection logic

Attached is the energy accounting patch that Martin and Yiannis have been working.  The framework is there, but the functionality it currently not working.  They are both on vacation this week and then are back a week before the conference.  I thought it would be better to send in order to get the framework and the structures in place for an official 2.5.0 instead of waiting.  If you disagree, just let us know and we can send it again when the low level functionality working.    Here is a short summary of our test results.

1. jobacct_gather/none + energy_accounting/none

Looks OK.  Did not find any errors.

2.  jobacct_gather/linux or cgroup + energy_accounting/none

Looks OK.  Did not find any errors.

3.  jobacct_gather/linux or cgroup + energy_accounting/rapl

Slurmd aborts when you run a job that uses a node that does not support RAPL.  This appears to be because of the error()/pexit() at line# 150/151 in energy_accounting_rapl.c.  We need to change this code to just issue a debug message and return.  For now, energy_accounting must not be configured if the cluster includes any nodes that do not support RAPL.

The cpu frequency values reported by jobacct_gather are not correct.

Again, there are obviously some problems, so if it would be better to wait for full functionality just let us know.  It may be three weeks before they are able to spend some time on this to fix the problems, so that is why I thought you may prefer to have something that has the correctly data structures in sooner rather than later.
parent 4dd74934
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment