Repository updates are stalled after error

uraabe · September 16, 2024, 11:39am

I noticed ContinuaCI refuses to update repositories after the shown below appears in the logs (sorry for the German messages). Stopping and restarting the CI server usually gets it running again, but it requires to log into the VM and do this manually each time. Is there something I can adjust to avoid this error?

## Private Message
|Subject:|A new 'NHibernateDatabase.SaveChanges' system event was logged|
| --- | --- |
|From:|*System*|
|Sent:|13.09.2024 07:53:30|
|A new event has been added to the event log:
NHibernateDatabase.SaveChanges

Error: Could not commit transaction: Exception: TransactionException

Message: Commit failed with SQL exception

Stack Trace: bei NHibernate.Transaction.AdoTransaction.Commit()
bei Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()

Exception: NpgsqlException

Message: Exception while reading from stream

Stack Trace: bei Npgsql.NpgsqlReadBuffer.<<Ensure>g__EnsureLong|40_0>d.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei Npgsql.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|201_0>d.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei Npgsql.NpgsqlConnector.<ExecuteInternalCommand>d__257.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei Npgsql.NpgsqlTransaction.<Commit>d__17.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei Npgsql.NpgsqlTransaction.Commit()
bei NHibernate.Transaction.AdoTransaction.Commit()

Exception: TimeoutException

Message: Timeout during reading attempt

Stack Trace:

, Pre-commit call stack: bei Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()
bei Continua.Modules.Builds.Agents.ServerAgentManager.Register(String address, String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)
bei Continua.Modules.Builds.Services.AgentRegistrationService.RegisterV3(String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)
bei `SyncInvokeRegisterV3(Object , Object[] , Object[] )`
bei System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)
bei System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)
bei System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)
bei System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage11(MessageRpc& rpc)
bei System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)
bei System.ServiceModel.Dispatcher.ChannelHandler.DispatchAndReleasePump(RequestContext request, Boolean cleanThread, OperationContext currentOperationContext)
bei System.ServiceModel.Dispatcher.ChannelHandler.HandleRequest(RequestContext request, OperationContext currentOperationContext)
bei System.ServiceModel.Dispatcher.ChannelHandler.AsyncMessagePump(IAsyncResult result)
bei System.Runtime.IOThreadScheduler.ScheduledOverlapped.IOCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped)
bei System.Runtime.Fx.IOCompletionThunk.UnhandledExceptionFrame(UInt32 error, UInt32 bytesRead, NativeOverlapped* nativeOverlapped)
bei System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)
, Commit time 180131ms|

Sparky · September 16, 2024, 1:04pm

Hi Uwe,

This error is due to a long timeout while committing changes to the database. This is generally happens when the database grows too large or needs some maintenance.

Using PGAdmin4, check the size of your database tables and indexes. If the indexes are large, then running VACUUM (FULL ANALYZE) or REINDEX may improve performance. If the table data is large, then it may be worth adjusting your build clean-up rules to reduce the amount of build information stored. Alternatively, it may be necessary to give the VM more CPU cores and/or RAM.

Meanwhile, we’ll do some testing with errors in this area of the code in the next day or two and see if there is anything that can be improved.

It’s not clear from the stack trace exactly which statement is causing this. It is however likely to be happening while saving agent properties to the database. We’ve not encountered errors in this area before, but we’d also expect to see other errors listed on the event log page. It’s also odd that it is affecting repositories. Are there any other different errors messages logged around the same time?

uraabe · September 17, 2024, 10:46am

Hello Dave,

the DB folder is merely 130 MB of size. That doesn’t look pretty large IMHO. If you can give me some detailed hints what numbers I can inspect with PGAdminI will be happy to provide those, too.

The VM is hosted on ESX and is configured with 4 cores and 8GB RAM. I could give it more RAM, but it already has all the cores the host contains (and what the free ESX server allows). If that is not enough for my (relative small - I thought) usage, then I probably keep it with manually restarting CI once a couple of days.

I am planning to switch to Proxmox in the future, but that needs quite a bit of planning and care.

Regarding these errors I will have an eye on other errors appearing in the same time frame whenever it happens again.

Sparky · September 17, 2024, 12:36pm

Hi Uwe,

That’s certainly not a large database size. We have known of users with databases tens of GB in size.

If you can select the Schemas - public - Tables node for the ContinuaCI database, then choose the Statistics Tab, you’ll see a report with rows for each table. Sort by live tuples and let us know the numbers for the largest four tables.

Can you also send us a diagnostics report? You can generate and download it from the Event Log page in the Administration section of Continua. Send it via email to support at finalbuilder.com.

Are you running the latest version of Continua?

uraabe · September 27, 2024, 1:05pm

The error appeared again about 7 hours ago.

Here are the current pgadmin values:

Live Tuples:
builds_changesetfile = 35441
builds_buildstatus = 13466
builds_stagemetric = 9112
builds_buildmetric = 6970

I have sent the report to support.

I’m currently on Continua 1.9.2.1388

Vincent · September 28, 2024, 1:34am

Hi Uwe

Thanks for the diagnostics report - the error is not providing us with much information, we have uploaded a build which includes more detail in the error message.

https://downloads.finalbuilder.com/downloads/continua/1.9.2/ContinuaCI.Server.Setup_x64_1.9.2.1389.exe

Vincent · September 28, 2024, 1:56am

Hi Uwe

Some more thoughts about this. The fact that this error is not occuring all the time makes us wonder - is there something running periodically on the machine that might be interfering. The most common issue is antivirus software running over the database folder. Make sure you have an exclusion for that folder - see Known Issues

Another candidate we have seen is backup software (rare though) and windows indexing.

Since your DB is relatively small - ram assigned to the VM should be ok - our production server has 4 cores and 8GB ram and is only using 3.8GB (6.5GB db) so your setup is fine.

I see you mentioned moving to proxmox - that is what we use and I can highly recommend it - it’s been flawless for us over the last 11 months. We moved from hyper-v 2019 to XCP-ng which was one disaster after another - too many issues to list here - migrated 30 vm’s to proxmox in a day - took a lot of work exporting them to our truenas storage (I did that the night before the migration) - proxmox is pretty easy to install and importing vm’s is pretty easy (easier from esxi than xcp).

We now have two identical proxmox machines (dual eypc 32 core cpus, 512G ram) and managing them is easy, migrating vm’s between the servers is quick and easy (dedicated 25Gb connections between the machines for replication/migration).

uraabe · September 28, 2024, 11:09am

I installed the new version and changed the antivirus and indexer settings following your suggestions. Now we need to wait a couple of days to see if the problem persists.

There actually is an Active Backup for Business running on a Synology NAS for that VM, but it is scheduled for 3:00 am and lasts no longer than 30 minutes. That doesn’t match the error appearing on 8:03 am.

uraabe · October 22, 2024, 11:54am

I really thought the problem has been fixed, but 4 days ago a similar error happened. If I can provide additional info please tell me. This is the error description (sorry for the German parts):

		4 days ago
Freitag, 18. Oktober 2024 06:23:10		NHibernateDatabase.SaveChanges

Error: Could not commit transaction: Exception: TransactionException

Message: Commit failed with SQL exception

Stack Trace: bei NHibernate.Transaction.AdoTransaction.Commit()
bei Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()

Exception: NpgsqlException

Message: Exception while reading from stream

Stack Trace: bei Npgsql.NpgsqlReadBuffer.<<Ensure>g__EnsureLong|40_0>d.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei Npgsql.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|201_0>d.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei Npgsql.NpgsqlConnector.<ExecuteInternalCommand>d__257.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei Npgsql.NpgsqlTransaction.<Commit>d__17.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei Npgsql.NpgsqlTransaction.Commit()
bei NHibernate.Transaction.AdoTransaction.Commit()

Exception: TimeoutException

Message: Timeout during reading attempt

Stack Trace:

, Pre-commit call stack: bei Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()
bei Continua.Modules.Builds.Agents.ServerAgentManager.Register(String address, String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)
bei Continua.Modules.Builds.Services.AgentRegistrationService.RegisterV3(String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)
bei SyncInvokeRegisterV3(Object , Object[] , Object[] )
bei System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)
bei System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)
bei System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)
bei System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage11(MessageRpc& rpc)
bei System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)
bei System.ServiceModel.Dispatcher.ChannelHandler.DispatchAndReleasePump(RequestContext request, Boolean cleanThread, OperationContext currentOperationContext)
bei System.ServiceModel.Dispatcher.ChannelHandler.HandleRequest(RequestContext request, OperationContext currentOperationContext)
bei System.ServiceModel.Dispatcher.ChannelHandler.AsyncMessagePump(IAsyncResult result)
bei System.ServiceModel.Dispatcher.ChannelHandler.OnAsyncReceiveComplete(IAsyncResult result)
bei System.Runtime.Fx.AsyncThunk.UnhandledExceptionFrame(IAsyncResult result)
bei System.Runtime.AsyncResult.Complete(Boolean completedSynchronously)
bei System.ServiceModel.Channels.TransportDuplexSessionChannel.TryReceiveAsyncResult.OnReceive(IAsyncResult result)
bei System.Runtime.Fx.AsyncThunk.UnhandledExceptionFrame(IAsyncResult result)
bei System.Runtime.AsyncResult.Complete(Boolean completedSynchronously)
bei System.ServiceModel.Channels.SynchronizedMessageSource.ReceiveAsyncResult.OnReceiveComplete(Object state)
bei System.ServiceModel.Channels.SessionConnectionReader.OnAsyncReadComplete(Object state)
bei System.ServiceModel.Channels.PipeConnection.OnAsyncReadComplete(Boolean haveResult, Int32 error, Int32 numBytes)
bei System.ServiceModel.Channels.OverlappedContext.CompleteCallback(UInt32 error, UInt32 numBytes, NativeOverlapped* nativeOverlapped)
bei System.Runtime.Fx.IOCompletionThunk.UnhandledExceptionFrame(UInt32 error, UInt32 bytesRead, NativeOverlapped* nativeOverlapped)
bei System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)
, Commit time 32171ms

Sparky · October 23, 2024, 1:11am

Hi Uwe,

Unfortunately we still don’t have any information which can point us to the cause of your issue. We would normally expect another further more detailed error message to be logged after this one. You mentioned in your first post that the repositories stop updating after this error occurs—does this still occur? Are there any additional errors logged?

As we are unable to reproduce this on our servers, enabling debug logging is the best way forward. Could you enable debug logging on the server and restart the service to apply the changes? Once the error occurs again, please send us the relevant debug log file (dated the same time as the error).

Enabling debug logging will affect server performance but not significantly enough to outweigh its use in diagnosing this issue. It will help us to understand what happens on the server before and after the error occurs.

Since the error is related to a PostgreSQL timeout, it would also be useful to check for any database service issues. You can enable PostgreSQL logging by modifying the postgresql.conf file, usually located at C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB. Be sure to back up the file first, then scroll to the REPORTING AND LOGGING section and adjust the following lines:

logging_collector = on
log_min_messages = info
log_min_duration_statement = 250ms

For more details on these settings, see the PostgreSQL docs.

uraabe · January 17, 2025, 2:41pm

Hi Dave!

The problem appeared this morning again with the current Continua 1.9.2.1432. The log file has about 6M size as zip. How shall I send it to you?

Unfortunately I missed to enable logging for PostgreSQL. Will do that in a moment.

Vincent · January 17, 2025, 10:17pm

Hi Uwe

Please do email it to support @ finalbuilder .com

uraabe · February 17, 2025, 12:30pm

Just an update: After 4 weeks without any problems I want to share the most likely cause. The event log mentioned errors about hibernating the hard disk, which is questionable on a VM in general, happening in the same time frame as the ContinuaCI errors. After disabling the corresponding power setting the errors vanished.

k3tchup · February 18, 2025, 9:07am

@uraabe
Which setting was it ?

uraabe · February 18, 2025, 10:59am

IIRC, it was changing the Attributes value in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\PowerSettings\0012ee47-9041-4b5d-9b77-535fba8b1442\0b2d69d7-a2a1-449c-9680-f91c70521c60
to 1.

Source: https://www.tenforums.com/tutorials/72971-add-ahci-link-power-management-power-options-windows.html

k3tchup · February 18, 2025, 1:42pm

OK, I see I already have it set to 1.

Repository updates are stalled after error

Products

Support

Resources

Company