Repositories in error state because of connection errors to GitHub.com

Hi, We have been experiencing an issue with Continua from some time ago. Sometimes the repository error log is filled with errors trying to communicate with GitHub.com. However we don’t have any problems connecting to GitHub in our machines which are in the same network and when we try to replicate the connection errors from the Build Machine we can’t reproduce them manually.

http://d.pr/i/55cd

These issues seem to put the repositories in an “error state” (out theory) and if we don’t clear the error log manually, the builds doesn’t get triggered when a new commit is made, etc.

How can we correct this issue? or it is really a problem with our internet connection? (We have tried a lot of things to try to improve the performance of the network connection in the build machine)

Thanks in advance

There are a lot of things that can cause this type of problem. Bad NIC on your build server, bad network hardware, failures on the github side, IPS and others.

Couple of suggestion:

Clone the repository to a local machine and point your build server to that. It’s not likely the right way to go for production, however if that works fine it might be safe to say it’s a problem with the connection between your build server and GitHub.

Install wireshark on the build server (or use some means of capturing packets). Setup the capture filter to filter on the git port being used and start capturing packets to a file. Leave that running until the problem occurs. Then comes the harder part, finding someone to review the captures. Hopefully you have network folks that can do that, it may not pinpoint the problem however it should help narrow down the search.

How many developers do you have hitting GitHub from your location? If there are a lot, and all using a shared IP address… it could look like an attack on their network and may cause prevention measures to kick in and start dropping connections.

Hi Luis,

Github has had a several problems with DDOS attacks recently, although nothing is in the status history for the date of your error messages. It is possible that these time outs are due to preventative measures put in place to mitigate the DDOS attacks.

In addition to the useful suggestions by b.walker, another thing to check is the number of Git processes which are running concurrently in the task manager. If you have a lot of GitHub repositories, it may be worth limiting the number of concurrent repository checkers. There is a Server property named Server.RepoMonitor.MaxCheckers  for this purpose (see Administration -> Properties). This is set to 5 by default. Try decreasing this number to see if this has an effect on the number of time out errors. Note however that this may also cause builds to take longer waiting for other repositories to finish checking before checking for new changesets.

The repository should resolve an error state automatically once the connection is restored. It attempts to do this every 2 minutes, or every 30 seconds if a build is waiting for this repository. If this is not happening, we would be interested in seeing a debug log.