Server shows 0 agents - broken communication

Today morning I saw that our CI server shows 0 agents on the dashboard, when I click it, it takes long time to load and it finishes with an error:

There are 3 builds “running” but they are unresponsive and “frozen”.
Seems like communication with server is broken and stopped at some point:

Continua Server logs are set to quiet, so nothin interesting there, only Git problem from over 2 hours ago.

Edit.
Server reboot helped.

Hi Michal,

Since the server reboot resolved the immediate problem, we can only suggest that you check the Windows event log for errors related to Continua, PostgreSQL, ASP.NET or IIS. Also, check available server resources (disk space, CPU, memory). Otherwise, if this issue starts to occur frequently, enable debug logging and send us a log file the next time it occurs.

Server resources were healthy.
In the event log I see that local agent service failed to register to server for some time.
There are also couple errors reported:

Error: Could not commit transaction: Exception: TransactionException

Message: Commit failed with SQL exception

Stack Trace:    at NHibernate.Transaction.AdoTransaction.Commit()
   at Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()

Exception: NpgsqlException

Message: Exception while reading from stream

Stack Trace:    at Npgsql.NpgsqlReadBuffer.<<Ensure>g__EnsureLong|40_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|201_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlConnector.<ExecuteInternalCommand>d__257.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlTransaction.<Commit>d__17.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlTransaction.Commit()
   at NHibernate.Transaction.AdoTransaction.Commit()

Exception: TimeoutException

Message: Timeout during reading attempt

Stack Trace: 

, Pre-commit call stack:    at Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()
   at Continua.Modules.Builds.Agents.ServerAgentManager.Register(String address, String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)
   at Continua.Modules.Builds.Services.AgentRegistrationService.RegisterV3(String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)
   at SyncInvokeRegisterV3(Object , Object[] , Object[] )
   at System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)
   at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)
   at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)
   at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage11(MessageRpc& rpc)
   at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)
   at System.ServiceModel.Dispatcher.ChannelHandler.DispatchAndReleasePump(RequestContext request, Boolean cleanThread, OperationContext currentOperationContext)
   at System.ServiceModel.Dispatcher.ChannelHandler.HandleRequest(RequestContext request, OperationContext currentOperationContext)
   at System.ServiceModel.Dispatcher.ChannelHandler.AsyncMessagePump(IAsyncResult result)
   at System.Runtime.IOThreadScheduler.ScheduledOverlapped.IOCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped)
   at System.Runtime.Fx.IOCompletionThunk.UnhandledExceptionFrame(UInt32 error, UInt32 bytesRead, NativeOverlapped* nativeOverlapped)
   at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)
, Commit time 37323ms
An error occurred while registering the agent <redacted>. Details: Exception: DatabaseException

Message: A database error occurred while registering agent with hostname '<redacted>', port '<redacted>' and version '<redacted>' at address '<redacted>'. Details: Exception: TransactionException

Message: Transaction not successfully started

Stack Trace:    at NHibernate.Transaction.AdoTransaction.CheckBegun()
   at NHibernate.Transaction.AdoTransaction.Rollback()
   at Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()
   at Continua.Modules.Builds.Agents.ServerAgentManager.Register(String address, String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)



Stack Trace:    at Continua.Modules.Builds.Agents.ServerAgentManager.Register(String address, String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)
   at Continua.Modules.Builds.Services.AgentRegistrationService.RegisterV3(String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)

Exception: TransactionException

Message: Transaction not successfully started

Stack Trace:    at NHibernate.Transaction.AdoTransaction.CheckBegun()
   at NHibernate.Transaction.AdoTransaction.Rollback()
   at Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()
   at Continua.Modules.Builds.Agents.ServerAgentManager.Register(String address, String hostname, Int32 port, String version, Boolean propertiesCollected, String collectorsHash)
Error: Could not commit transaction: Exception: GenericADOException

Message: could not insert: [Continua.Modules.Builds.BuildStatus#89442d35-836f-45c3-a82d-b280005d3cc4][SQL: INSERT INTO builds_buildstatus (State, Date, Message, buildid, Id) VALUES (?, ?, ?, ?, ?)]

Stack Trace:    at NHibernate.Persister.Entity.AbstractEntityPersister.Insert(Object id, Object[] fields, Boolean[] notNull, Int32 j, SqlCommandInfo sql, Object obj, ISessionImplementor session)
   at NHibernate.Persister.Entity.AbstractEntityPersister.Insert(Object id, Object[] fields, Object obj, ISessionImplementor session)
   at NHibernate.Action.EntityInsertAction.Execute()
   at NHibernate.Engine.ActionQueue.InnerExecute(IExecutable executable)
   at NHibernate.Engine.ActionQueue.ExecuteActions[T](List`1 list)
   at NHibernate.Engine.ActionQueue.ExecuteActions()
   at NHibernate.Event.Default.AbstractFlushingEventListener.PerformExecutions(IEventSource session)
   at NHibernate.Event.Default.DefaultFlushEventListener.OnFlush(FlushEvent event)
   at NHibernate.Impl.SessionImpl.Flush()
   at NHibernate.Impl.SessionImpl.BeforeTransactionCompletion(ITransaction tx)
   at NHibernate.Transaction.AdoTransaction.Commit()
   at Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()

Exception: NpgsqlException

Message: Exception while reading from stream

Stack Trace:    at Npgsql.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|201_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlDataReader.<NextResult>d__44.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlDataReader.NextResult()
   at Npgsql.NpgsqlCommand.<ExecuteReader>d__100.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Npgsql.NpgsqlCommand.<ExecuteReader>d__100.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlCommand.<ExecuteNonQuery>d__88.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery()
   at NHibernate.AdoNet.AbstractBatcher.ExecuteNonQuery(DbCommand cmd)
   at NHibernate.AdoNet.NonBatchingBatcher.AddToBatch(IExpectation expectation)
   at NHibernate.Persister.Entity.AbstractEntityPersister.Insert(Object id, Object[] fields, Boolean[] notNull, Int32 j, SqlCommandInfo sql, Object obj, ISessionImplementor session)

Exception: TimeoutException

Message: Timeout during reading attempt

Stack Trace: 

, Pre-commit call stack:    at Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()
   at Continua.Modules.Builds.BuildManager.<>c__DisplayClass88_0.<UpdateStatus>b__0(Int32 x)
   at Continua.Shared.Utils.LockList`2.WithLock(TId id, Action`1 action)
   at Continua.Modules.Builds.BuildManager.UpdateStatus(Build build, BuildState newState, Boolean forceStateChange, Boolean updateStatusHistory, String message)
   at Continua.Modules.Builds.BuildManager.UpdateStatus(Build build, BuildState newState, String message)
   at Continua.Modules.Builds.BuildRunner.SkipToNextStage(ISession session, Build build, Stage stage)
   at Continua.Modules.Builds.BuildRunner.OnGetNextStage(Transition`1 inState)
   at Continua.StateMachine.StateMachine`1.Execute(Transition`1 begin)
   at Continua.Modules.Builds.BuildRunner.StartStage(Guid stageId)
   at Continua.Modules.Builds.BuildController.OnTaskExecute(Object state)
   at System.Threading.Tasks.Task.Execute()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at Continua.Shared.Utils.Threading.LimitedConcurrencyLevelTaskScheduler.<NotifyThreadPoolOfPendingWork>b__6_0(Object _)
   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
, Commit time 47909ms
An error occurred while saving new state StageCompleted and message 'Stage 'Configure' is disabled. Moving on to next stage.' for build 294787 with build number 7631 and previous state StageNotRun: Exception: GenericADOException

Message: could not insert: [Continua.Modules.Builds.BuildStatus#89442d35-836f-45c3-a82d-b280005d3cc4][SQL: INSERT INTO builds_buildstatus (State, Date, Message, buildid, Id) VALUES (?, ?, ?, ?, ?)]

Stack Trace:    at NHibernate.Persister.Entity.AbstractEntityPersister.Insert(Object id, Object[] fields, Boolean[] notNull, Int32 j, SqlCommandInfo sql, Object obj, ISessionImplementor session)
   at NHibernate.Persister.Entity.AbstractEntityPersister.Insert(Object id, Object[] fields, Object obj, ISessionImplementor session)
   at NHibernate.Action.EntityInsertAction.Execute()
   at NHibernate.Engine.ActionQueue.InnerExecute(IExecutable executable)
   at NHibernate.Engine.ActionQueue.ExecuteActions[T](List`1 list)
   at NHibernate.Engine.ActionQueue.ExecuteActions()
   at NHibernate.Event.Default.AbstractFlushingEventListener.PerformExecutions(IEventSource session)
   at NHibernate.Event.Default.DefaultFlushEventListener.OnFlush(FlushEvent event)
   at NHibernate.Impl.SessionImpl.Flush()
   at NHibernate.Impl.SessionImpl.BeforeTransactionCompletion(ITransaction tx)
   at NHibernate.Transaction.AdoTransaction.Commit()
   at Continua.Shared.Data.Hibernate.NHibernateDatabase.SaveChanges()
   at Continua.Modules.Builds.BuildManager.<>c__DisplayClass88_0.<UpdateStatus>b__0(Int32 x)

Exception: NpgsqlException

Message: Exception while reading from stream

Stack Trace:    at Npgsql.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|201_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlDataReader.<NextResult>d__44.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlDataReader.NextResult()
   at Npgsql.NpgsqlCommand.<ExecuteReader>d__100.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Npgsql.NpgsqlCommand.<ExecuteReader>d__100.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlCommand.<ExecuteNonQuery>d__88.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery()
   at NHibernate.AdoNet.AbstractBatcher.ExecuteNonQuery(DbCommand cmd)
   at NHibernate.AdoNet.NonBatchingBatcher.AddToBatch(IExpectation expectation)
   at NHibernate.Persister.Entity.AbstractEntityPersister.Insert(Object id, Object[] fields, Boolean[] notNull, Int32 j, SqlCommandInfo sql, Object obj, ISessionImplementor session)

Exception: TimeoutException

Message: Timeout during reading attempt

Stack Trace: 

The stack trace is empty.

An error occurred while getting next stage to run for build with id '294787': Exception: LazyInitializationException

Message: Initializing[Continua.Modules.Builds.StageDefinition#bda9ebf1-ebf8-463d-893f-b27b00915e30]-failed to lazily initialize a collection of role: Continua.Modules.Builds.StageDefinition.SkipConditions, no session or session was closed

Stack Trace:    at NHibernate.Collection.AbstractPersistentCollection.ThrowLazyInitializationException(String message)
   at NHibernate.Collection.AbstractPersistentCollection.ThrowLazyInitializationExceptionIfNotConnected()
   at NHibernate.Collection.AbstractPersistentCollection.ReadSize()
   at NHibernate.Collection.Generic.PersistentGenericBag`1.get_Count()
   at Continua.Modules.Builds.BuildRunner.CheckConditions(IList`1 conditionsList, ConditionExpressionLogic conditionsLogic, Stage stage, Boolean isPromoteConditions)
   at Continua.Modules.Builds.BuildRunner.SkipToNextStage(ISession session, Build build, Stage stage)
   at Continua.Modules.Builds.BuildRunner.OnGetNextStage(Transition`1 inState)

Hi Michal,

These errors are all due to a database timeout. The PostgreSQL service is not responding for some reason. It would be interesting to see what was happening at the database level.

Assuming you are using the local bundled PostgreSQL service, can you enable logging by modifying the postgresql.conf file, usually located at C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB. Be sure to back up the file first, then scroll to the REPORTING AND LOGGING section and adjust the following lines:

logging_collector = on
log_min_messages = info
log_min_duration_statement = 250ms

For more details on these settings, see the PostgreSQL docs.

If you are using an external PostgreSQL service on an separate server, then perhaps this could be due to network connectivity issues?