Handle situation where a slurmctld tries to communicate with slurmdbd more...
Handle situation where a slurmctld tries to communicate with slurmdbd more than once at the same time. What can happen here is the slurmdbd/slurmctld connection gets hung up somehow. If the slurmctld is restarted a new connection is made along side the old connection. When the old connection gets unwedged the old connection will clear out the registration of the slurmctld making it so no updates are sent to that slurmctld. What this does is checks for old connections when a registration message comes in. If we find one we print error set the rem_port = 0 and remove it from the list. This makes it so when it gets unwedged we just close the socket instead of remove the registration. Bug 5213
Please register or sign in to comment