The only possible cause I can think of is that the database has become corrupted, although the error message would generally be different.
I recommend that you shutdown the Continua CI PostgreSQL service and take a full copy of the C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB folder before doing anything.
Next see if there are any recent backup files. If you have upgraded recently, then there should be automatic back ups in the C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB folder. If the backups are recent enough, then try restoring over the database.
Otherwise there is a special option: zero_damaged_pages=on that you can use on postgresql.conf, see this documentation, this will allow you to run pg_dump without stopping on errors and may allow you to get a backup, although you will loose any data that cannot be read.
we’re up and running our buildserver again. However, we still face issues.
When starting Windows we do need to start the PostgreSQL service after starting the Continua.Service to avoid a permant cpu usage of 100%.
Still Continua.Service triggers a cpu usage of 100% approximately every 30-40 seconds for about 34 seconds. Involved are Continua Agent Service and two instances of the WMI Provider Host. At this time no builds are running. Disk usage is down at 0-3 percent.
Our buildserver is running on Windows 10 Pro 1903 64-Bit. It’s a virtual machine on a Xeon 4 core processor (3.30 GHz) with 32 GB of RAM, of which the vm can use 6 GB.
i’ve just checked the event log and found entries such as:
Die Beschreibung für die Ereignis-ID “0” aus der Quelle “PostgreSQL” wurde nicht gefunden. Entweder ist die Komponente, die dieses Ereignis auslöst, nicht auf dem lokalen Computer installiert, oder die Installation ist beschädigt. Sie können die Komponente auf dem lokalen Computer installieren oder reparieren.
Falls das Ereignis auf einem anderen Computer aufgetreten ist, mussten die Anzeigeinformationen mit dem Ereignis gespeichert werden.
Die folgenden Informationen wurden mit dem Ereignis gespeichert:
FATAL: Verbindung wird abgebrochen aufgrund von Anweisung des Administrators
Last Friday we encountered an issue which we worked around by manually starting the postgresql service after the continua service to get things running. I need to enable debug logging and restart Continua to provide more details. However, i need to find a timeframe to do so first.
About the cpu usage we were able to chase things down to the way Continua refreshes (git) repositories.
The default setting for “Server.RepoMonitor.MaxCheckers” is ‘5’, which caused the high load. Reducing it to a value of ‘2’ immediately helped. In the end we configurared a value of ‘1’. This results in a permanent and steady load of 35-40% cpu usage. As Vincent suggested we need to check server requirements. We may upgrade to a server with more power, especially more cpu cores. Interesting is that each “git fetch”, issued to gather information, causes a high load. To some extend the problems are caused by git. To further reduce cpu load we also reduced repo polling downto 5 minutes, rather than the default of 60 seconds. We also configured some repos to poll “manually”.
How many git repositories do you have? Are they large repositories? I’m just checking the git processes and they are hardly getting past 0 CPU.
The error points to an issue with your database. What’s the size of the database, and the number of records in the larger tables e.g. build_builds and builds_unittestv2. It may be that your database indexes have become bloated, so it may be worth running a REINDEX DATABASE command.
Die Beschreibung für die Ereignis-ID “0” aus der Quelle “PostgreSQL” wurde nicht gefunden. Entweder ist die Komponente, die dieses Ereignis auslöst, nicht auf dem lokalen Computer installiert, oder die Installation ist beschädigt. Sie können die Komponente auf dem lokalen Computer installieren oder reparieren.
Falls das Ereignis auf einem anderen Computer aufgetreten ist, mussten die Anzeigeinformationen mit dem Ereignis gespeichert werden.
Die folgenden Informationen wurden mit dem Ereignis gespeichert:
FEHLER: Relation »builds_unittest« existiert nicht bei Zeichen 16
ANWEISUNG: SELECT id FROM builds_unittest limit 1
As for the socket error we encounter a reindex operation did not fix it. Using pgAdmin i am not able to query data. However when working with Continua we don’t encounter any issues so far.
Manual polling, using post commit hooks. This reduces CPU usage on the server, by only polling for repository changes when requested and has the added benefit of reducing the load on your version control server. This does take some setting up, and depends very much on the capabilities of your version control system. I’ll take a look at post commit hooks in a future blog post. Continuous Integration Server performance
Is this something that has been researched yet? We’re using TFS 2017, but installing (Azure DevOps Server 2019 is no problem.
Thank you for sending the debug log. We can find no errors, timeouts or slow operations in the 20 minutes logged, so there’s not really anything there for us to investigate. The git fetch commands are fast, but there are no new changesets. A longer debug log which is enabled while the above errors are occurring may be more fruitful.
It is possible to add a service hook to TFS/DevOps server to call a URL. Note however that your Continua CI server website will need to be accessible to the TFS/DevOps server.
Under the TFS project settings, on the Service Hooks tab, you can create an new subscription to integrate with a Web Hooks service. Select the Code pushed trigger event and your repository, then enter the repository Manual Poll URL.
Of course, when we re-tested this today it did not start a repository check. TFS sends an id in its request which overrides the id in the url. We’ll fix this for the next version.
We’re more concerned about you not being able to query the database using pgAdmin4 and the FATAL: Verbindung wird abgebrochen aufgrund von Anweisung des Administrators error message. If you backup the database using pg_dump.exe, is it successful or are there any errors?
thanks for checking the log. For the time being no further action necessary. As for the TFS Hook i will give that a try, when the Continua fix is available. As for the database errors i can execute pg_dump with no errors.
I configured a Web Hook Action “Post via HTTP”. For the URL i configured the root of our ContinuaCI Server. TFS triggers the event, when i push commits to the repository. However, the action fails with “Not Found (404)”.
Did you copy the Manual Poll URL correctly from the repository settings? Is the URL accessible from the TFS server? I assume this is an on-premises TFS server, can you log on to it and browse to the URL?
Just need to enter the correct Manual Poll URL. As often the problem resides in front of the computer. However, there is an issue with ContinuaCI’s repository dialog. After you change the Polling Frequence to “Never (Manual)” the URL field will not be displayed! You must save and close the dialog first. After reopening the dialog the URL will be visible.