My experience with Execute Program timeout handling has been negative.
I run MS signtool using Execute Program, and in the last few weeks the external trusted time source has sometimes responded very slowly. Signtool would simply wait for several minutes and then hang. Execute Program raises an exception, but does not kill the hung signtool process. I cannot kill it myself, as the process ID is not provided.
In FinalBuilder, there is a WMI Run Process action where the process ID can be obtained, but it does not wait for successful completion and does not allow checking the exit code. Do you have any suggestions for implementing this using FinalBuilder tools?
I am reporting this as a bug, as the 2015 discussion expected that the Execute Program process timeout would kill the process.
Thank you.
Thank you for your suggestion. Also found this solution. Problem with Timestamping SaaS - Apply RFC 3161 Timestamps | GlobalSign gets cooler with rfc3161.ai.moda.
Anyhow before it’s implementation I saw signtool processes running after FB stopped build on 5 minute timeout. It’s difficult to reproduce since it’s very irregular but i’ll try.
Done Adding Additional Store
Number of files successfully Signed: 0
Number of warnings: 0
Number of errors: 1
SignTool Error: The file is being used by another process.
SignTool Error: An error occurred while attempting to sign: D:_Temp_Temp\PMAgentA0\AllModes\PEMT.exe
Program returned code : 1
Retrying action (2 of 3)…
Executing external process: C:\Program Files (x86)\Windows Kits\10\App Certification Kit\signtool.Exe
etc.
FB runtime shut down with build failed but two instances of signtool with command lines definitely related to this build still present in process list including mentioned above “sign /sha1 HEXHEXHEX /tr http://rfc3161.ai.moda /v /td SHA256 /fd SHA256 /d “description” /du https://url.com D:_Temp_Temp\PMAgentA0\AllModes\PEMT.exe”.
FB is 8.0.0.3065 and I have no chance to update but it could be important to know for you and customers. Ready for cooperation
Some additional details to the issue. FB-project uses asinc included projects to sign code. So signtool started by nested(3 layers deep) included project. As you see fail is in retry action attempting to open the file opened by must be already killed first run. Probably file closes too slow so we have to add extra retry pause?
Done Adding Additional Store
Execution timed out after 9 minutes.
Retrying action (1 of 3)…
Executing external process: C:\Program Files (x86)\Windows Kits\8.1\bin\x86\signtool.Exe
Starting Directory: C:\Program Files (x86)\Windows Kits\8.1\bin\x86
Parameters: sign /sha1 HEXHEXHEX /tr http://rfc3161.ai.moda /v /td SHA256 /fd SHA256 /d description" /du https://url.org/ D:_Temp_Temp\A0APIA0\CmBxLocalNet\A0Service.exe
Output from C:\Program Files (x86)\Windows Kits\8.1\bin\x86\signtool.Exe
The following certificate was selected:
Issued to: LLC ***
Issued by: GlobalSign GCC R45 CodeSigning CA 2020
Expires: Fri Jul 31 16:05:53 2026
SHA1 hash: HEXHEXHEX
Done Adding Additional Store
Successfully signed: D:\_Temp\_Temp\A0APIA0\CmBxLocalNet\A0Service.exe
signtool started by first attempt haven’t been successfully killed by FB and block later deletion of “D:\_Temp\_Temp\A0APIA0\CmBxLocalNet\” directory. It persists after builld termination. This process was definitely in abnormal state - it crashed on attempt to get it’s wait chain. At the same time it clearly prooves FB was unable to kill it and can’t handle this situation properly.
It looks the problem depends on responce time of external resources but it should be possible to handle.
Not sure why signtool is not terminating - our timeout code uses Toolhelp windows api to first find any child processes and terminate them and then terrminate the parent process.
One possibility is that antivirus may see the changes to the exe when signing and take a lock on the folder while it checks the files. Other than that I am out of ideas.
Thank you for details, Vincent. The antivirus hypothesis is not a silver bullet, since we have disabled all antiviruses on this particular server. There are hundreds of signtool runs during the build process, and only 1-2 problems occur occasionally. What’s strange to me is the lack of FB report about the inability to terminate the process. Problem frequency grows in some time periods like 1-3 days then goes to zero. Now I replaced signtool with Win 8.1 SDK version, replaced timestamp server with loadbalancing one and enabled signtool debug output. One 3 hour build was successful but it’s not confirmation of the problem solution yet. Collecting statistics and logs.
My guess…Perhaps the code in FB that kills the signtool process does not wait for termination confirmation due to asynchronicity? There should also be a timeout handling in that case.
FB can only report what it gets from the windows api. If the api call returns success, we can only assume that it did in fact succeed. The code we use to kill processes on timeout has been tested extensively and includes multiple checks to see if the process is still running.
Without being able to reproduce the problem here, there isn’t anything we can do differently without possibly causing other issues/failures.
There is an example project here
that shows how to do codesigning and timestamping separately, and allows using a list of timestamp servers so if one fails it will try the next.
Thank you very much for well structured code, Vincent. I’m refactoring ~10 years old FB code so your style is very breathtaking. I’ll report on my progress later. Could be splitting signing and timestamping helps to make it stable.
Hello, Vincent. Currently I expect the issue roots in hardware. Your sample code uses .pfx certificate storage but this summer we got usb token with certificate unextractable. In fact it’s external processor for signing not just storage. .pfx signing is multithreaded since all processing handled by main CPU but token can handle only one signature at a time so I could expect some kind of queue inside token driver and lock/unlock management. Perhaps mutex could help.
I have simple question. FB has specific signtool actions and generic Execute Program actions. Do FB had any difference internally related to timeout handling, output capture, exit code handling? We use Execute Program but is signtool specific action “better” ? I see it has no way to use /debug option but it’s not the issue.
The Execute Program action and the Signtool actions both use the same code (a wrapper over CreateProcess).
You could defintely use a mutex if all the signing is done from one finalbuilder project.
We’re working on a code signing server product and found issues with concurrent signing - even though the tokens say they support multiple concurrent sessions we ended up having to put locking around the signing calls. Since that happens on the server it’s pretty quick anyway. We are still hoping to be able to remove the locking - more research needed.