FinalBuilder is written in a combination of Delphi and C#. This is a blog post about a Delphi RTL threading issue that I recently ran into and had to work around.
We recently had some bug reports of FinalBuilder deadlocking at shutdown under certain circumstances. I could reproduce the problem, but I was baffled for a long time because - according to the call stacks - the program should have been fine.
The issue I'd run into has been around for years, and is summed up nicely in this Borland QC note: Synchronize and WaitFor methods of TThread can lead to a DeadLock in logically correct applications.[Dead link removed]
The Problem
The simple summary is: If you ever have a call stack which includes both Synchronize and WaitFor, and the thread you are waiting on also calls Synchronize at some point, you will occasionally get a deadlock.
Explaining why is kind of tricky, but if you read the QC case you'll get the gist of it. The problematic sequence of events looks like this:
- Thread A calls Synchronize(MethodA)
- Thread B calls Synchronize(MethodB)
Then, inside the context of the Main Thread:
- Main thread calls CheckSynchronize() while processing messages
- CheckSynchronize is implemented to batch-process all waiting calls(*). So it picks up the queue of waiting calls (containing MethodA and MethodB) and loops through them one by one.
- MethodA executes in the main thread's context. Assume MethodA calls ThreadB.WaitFor
- WaitFor calls CheckSynchronize to process any waiting calls to Synchronize
- In theory, this should then process ThreadB's Synchronize(MethodB), allowing Thread B to complete.
- However, MethodB is already a possession of the first CheckSynchronize call, so it never gets called.
DEADLOCK!
The Solution
The ideal solution is to refactor your program so it will never call CheckSynchronize or WaitFor from inside an already-synchronized method call. For FinalBuilder, this was not a good option - all it takes is one Synchronized method call to call ProcessMessages and you're letting in dozens of potential deadlocks.
The solution I came up with is to write a "SafeSynchronize" method that only allows one thread at a time to get on the Synchronize method queue. This comes with a performance hit, but prevents the deadlock.
Here's a new class which implements the SafeSynchronize method:
TSafeSyncThread = class(TThread)
private
procedure SafeSynchronizeInner;
protected
procedure SafeSynchronize(const AThreadMethod: TThreadMethod);
end;
implementation
{ TSafeSyncThread }
var
FSingleSynchronize : TCriticalSection;
FSingleSynchronizeTarget : TThreadMethod;
procedure TSafeSyncThread.SafeSynchronize(const AThreadMethod: TThreadMethod);
begin
FSingleSynchronize.Acquire;
FSingleSynchronizeTarget := AThreadMethod;
Synchronize(SafeSynchronizeInner);
end;
procedure TSafeSyncThread.SafeSynchronizeInner;
var
targetMethod : TThreadMethod;
begin
targetMethod := FSingleSynchronizeTarget;
FSingleSynchronize.Release; // Now we're inside a CheckSynchronize call, it's safe to let another thread join the queue
targetMethod;
end;
Basically, this serializes access to the Synchronize queue. The performance hit is potentially quite large if you have a lot of concurrent SafeSynchronize calls and a busy main thread. FinalBuilder is designed in a way which minimizes this, so it's less of an issue for us.
I put together a quick test project which demonstrates the issue (at least on Delphi 2007.) You can get it here if you want to have a look. It also contains SafeSyncThread.pas if you want to incorporate the workaround into your own projects. This code comes with no guarantees and no warranty, obviously.
If anyone has a better (preferably faster) solution to this problem, which doesn't require rewriting the RTL, please let us know. :).
PS The test project example contains some fairly awful cut-and-paste code reproduction, which - for the record - I normally avoid like the plague. I used it here because the methods are very simple, and I couldn't see an easy way to rearchitect it.
(*) FWIW, I actually quite like the way that CheckSynchronize is implemented, using an InterlockedExchange to swap the old queue object with a brand new one. That way you can process the entire old queue at once, without granular locking. Check the source code if you want to see what I mean. It's just a shame about the deadlock. (back)