Message: Begin failed with SQL exception

Sparky · August 30, 2019, 9:11am

The only possible cause I can think of is that the database has become corrupted, although the error message would generally be different.

I recommend that you shutdown the Continua CI PostgreSQL service and take a full copy of the C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB folder before doing anything.

Then start up the service again and attempt to backup the database.

Next see if there are any recent backup files. If you have upgraded recently, then there should be automatic back ups in the C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB folder. If the backups are recent enough, then try restoring over the database.

Otherwise there is a special option: zero_damaged_pages=on that you can use on postgresql.conf, see this documentation, this will allow you to run pg_dump without stopping on errors and may allow you to get a backup, although you will loose any data that cannot be read.

sa_thomaslohrum · September 2, 2019, 9:35am

Hi Dave,

we’re up and running our buildserver again. However, we still face issues.

When starting Windows we do need to start the PostgreSQL service after starting the Continua.Service to avoid a permant cpu usage of 100%.
Still Continua.Service triggers a cpu usage of 100% approximately every 30-40 seconds for about 34 seconds. Involved are Continua Agent Service and two instances of the WMI Provider Host. At this time no builds are running. Disk usage is down at 0-3 percent.

Our buildserver is running on Windows 10 Pro 1903 64-Bit. It’s a virtual machine on a Xeon 4 core processor (3.30 GHz) with 32 GB of RAM, of which the vm can use 6 GB.

Vincent · September 2, 2019, 12:23pm

Continua CI doesn’t do anything with WMI directly. The fact that the vm was moved to new hardware suggests an OS or VM configuration issue.

The above post shows how to diagnose which process is causing the high cpu usage on the wmi provider host.

Also, what hypervisor is the VM running on?

Vincent · September 2, 2019, 12:31pm

BTW, if you haven’t already looked at this blog post

https://www.finalbuilder.com/resources/blogs/continuous-integration-server-performance

I would highly recommend taking a look, in particular the part about antivirus software.

Also enable debug logging https://wiki.finalbuilder.com/display/continua/Debug+Logging and wait till you sees the cpu spike, the stop the service (and turn of debug logging) and send us the log file.

sa_thomaslohrum · September 2, 2019, 1:06pm

Hi Vincent,

thanks for your reply. I will check and let you know the results.

Sparky · September 5, 2019, 6:28am

Hi Thomas,

Are you still seeing database errors and high CPU issues?

sa_thomaslohrum · September 5, 2019, 7:34am

Hi Dave,

i’ve just checked the event log and found entries such as:

Die Beschreibung für die Ereignis-ID “0” aus der Quelle “PostgreSQL” wurde nicht gefunden. Entweder ist die Komponente, die dieses Ereignis auslöst, nicht auf dem lokalen Computer installiert, oder die Installation ist beschädigt. Sie können die Komponente auf dem lokalen Computer installieren oder reparieren.

Falls das Ereignis auf einem anderen Computer aufgetreten ist, mussten die Anzeigeinformationen mit dem Ereignis gespeichert werden.

Die folgenden Informationen wurden mit dem Ereignis gespeichert:

FATAL: Verbindung wird abgebrochen aufgrund von Anweisung des Administrators

Last Friday we encountered an issue which we worked around by manually starting the postgresql service after the continua service to get things running. I need to enable debug logging and restart Continua to provide more details. However, i need to find a timeframe to do so first.

About the cpu usage we were able to chase things down to the way Continua refreshes (git) repositories.
The default setting for “Server.RepoMonitor.MaxCheckers” is ‘5’, which caused the high load. Reducing it to a value of ‘2’ immediately helped. In the end we configurared a value of ‘1’. This results in a permanent and steady load of 35-40% cpu usage. As Vincent suggested we need to check server requirements. We may upgrade to a server with more power, especially more cpu cores. Interesting is that each “git fetch”, issued to gather information, causes a high load. To some extend the problems are caused by git. To further reduce cpu load we also reduced repo polling downto 5 minutes, rather than the default of 60 seconds. We also configured some repos to poll “manually”.

Sparky · September 5, 2019, 7:51am

How many git repositories do you have? Are they large repositories? I’m just checking the git processes and they are hardly getting past 0 CPU.

The error points to an issue with your database. What’s the size of the database, and the number of records in the larger tables e.g. build_builds and builds_unittestv2. It may be that your database indexes have become bloated, so it may be worth running a REINDEX DATABASE command.

sa_thomaslohrum · September 5, 2019, 8:05am

It’s 32 active repos only. One of them is around 5 GB. The others are much smaller, that is less than 100 MB.

sa_thomaslohrum · September 5, 2019, 8:07am

I found another entry in the event log:

Die Beschreibung für die Ereignis-ID “0” aus der Quelle “PostgreSQL” wurde nicht gefunden. Entweder ist die Komponente, die dieses Ereignis auslöst, nicht auf dem lokalen Computer installiert, oder die Installation ist beschädigt. Sie können die Komponente auf dem lokalen Computer installieren oder reparieren.

Falls das Ereignis auf einem anderen Computer aufgetreten ist, mussten die Anzeigeinformationen mit dem Ereignis gespeichert werden.

Die folgenden Informationen wurden mit dem Ereignis gespeichert:

FEHLER: Relation »builds_unittest« existiert nicht bei Zeichen 16
ANWEISUNG: SELECT id FROM builds_unittest limit 1

Sparky · September 5, 2019, 8:27am

That’s normal. builds_unittest has been replaced with builds_unittestv2 but we run a test to see if it exists for backward compatibility.

sa_thomaslohrum · September 5, 2019, 2:50pm

Sent debug log regarding cpu usage by private mail.

sa_thomaslohrum · September 5, 2019, 2:51pm

As for the socket error we encounter a reindex operation did not fix it. Using pgAdmin i am not able to query data. However when working with Continua we don’t encounter any issues so far.

sa_thomaslohrum · September 5, 2019, 2:54pm

Vincent’s blog notes:

Manual polling, using post commit hooks. This reduces CPU usage on the server, by only polling for repository changes when requested and has the added benefit of reducing the load on your version control server. This does take some setting up, and depends very much on the capabilities of your version control system. I’ll take a look at post commit hooks in a future blog post.
Continuous Integration Server performance

Is this something that has been researched yet? We’re using TFS 2017, but installing (Azure DevOps Server 2019 is no problem.

Sparky · September 6, 2019, 4:52am

Hi Thomas,

Thank you for sending the debug log. We can find no errors, timeouts or slow operations in the 20 minutes logged, so there’s not really anything there for us to investigate. The git fetch commands are fast, but there are no new changesets. A longer debug log which is enabled while the above errors are occurring may be more fruitful.

It is possible to add a service hook to TFS/DevOps server to call a URL. Note however that your Continua CI server website will need to be accessible to the TFS/DevOps server.

Under the TFS project settings, on the Service Hooks tab, you can create an new subscription to integrate with a Web Hooks service. Select the Code pushed trigger event and your repository, then enter the repository Manual Poll URL.

Of course, when we re-tested this today it did not start a repository check. TFS sends an id in its request which overrides the id in the url. We’ll fix this for the next version.

We’re more concerned about you not being able to query the database using pgAdmin4 and the FATAL: Verbindung wird abgebrochen aufgrund von Anweisung des Administrators error message. If you backup the database using pg_dump.exe, is it successful or are there any errors?

sa_thomaslohrum · September 9, 2019, 7:06am

Hi Dave,

thanks for checking the log. For the time being no further action necessary. As for the TFS Hook i will give that a try, when the Continua fix is available. As for the database errors i can execute pg_dump with no errors.

Command used: pg_dump -U username -W -F t ContinuaCI > c:\BAK\ContinuaCI.tar

Sparky · September 9, 2019, 7:40am

OK, let us know if you have any further issues with the database.

Note that version 1.9.1.336 includes the update to the repository polling endpoint to support triggering from a TFS web hook

sa_thomaslohrum · September 9, 2019, 2:54pm

I configured a Web Hook Action “Post via HTTP”. For the URL i configured the root of our ContinuaCI Server. TFS triggers the event, when i push commits to the repository. However, the action fails with “Not Found (404)”.

Any ideas?

Sparky · September 9, 2019, 11:41pm

Works for us .

Did you copy the Manual Poll URL correctly from the repository settings? Is the URL accessible from the TFS server? I assume this is an on-premises TFS server, can you log on to it and browse to the URL?

sa_thomaslohrum · September 10, 2019, 6:12am

Works for me too

Just need to enter the correct Manual Poll URL. As often the problem resides in front of the computer. However, there is an issue with ContinuaCI’s repository dialog. After you change the Polling Frequence to “Never (Manual)” the URL field will not be displayed! You must save and close the dialog first. After reopening the dialog the URL will be visible.

Thanks for your help!

Message: Begin failed with SQL exception

Products

Support

Resources

Company