We have recently added 2 more agents to our setup, made from old workspaces.
Because of that they are still in separate network than the server and other agents.
We somtimes get a disconnect during build time:
Stage error:
The agent 'wrp-buildcews02' which was executing stage 'Build IBIS project' has gone offline. Agent status is Online, Authorized, Licensed, PropertiesCollected. Agent was last active at 13:56:25. Status checked at 13:56:47. Agent communication test failed.
Server Agent controller event:
An error occurred while checking if the agent is alive: The open operation did not complete within the allotted timeout of 00:00:10. The time allotted to this operation may have been a portion of a longer timeout. The socket transfer timed out after 00:00:01.4368187. You have exceeded the timeout set on your binding. The time allotted to this operation may have been a portion of a longer timeout. The read operation failed, see inner exception. The socket transfer timed out after 00:00:01.4368187. You have exceeded the timeout set on your binding. The time allotted to this operation may have been a portion of a longer timeout. A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. Number of retries = 0. Elapsed time = 3m 42s 503ms
Agent controller event:
An exception occurred while getting agent properties from cached collectors for agent 'wrp-buildcews02'. Details: 'Exception: TimeoutException
Message: This request operation sent to net.tcp://10.80.58.146:9002/IAgentService did not receive a reply within the configured timeout (00:02:00). The time allotted to this operation may have been a portion of a longer timeout. This may be because the service is still processing the operation or because the service was unable to send a reply message. Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client.
In reality Agent is alive, but I suspect that while it’s loaded with work and the network might be slower between those agents and server that it’s the root cause of this problem.
The question here is, are there any configuration parameters that can extend this wait time out or increase the retry counter ?
P.S.
Some logged event could state on which agent was the problem reported (not always visible).