Very occasionally, our Automise batch processes will hang up and just not complete. When I go to see what action it gets stuck on, it is something that should not be an issue. I would expect that when it does happen, it would be because it is reaching out to an external system or database. However, last night it got stuck on the second switch statement while going through a CSV Iterator loop. It evaluated several rows of the CSV file without a problem, and just got hung up on this one step that it should not have any problem with.
Is there any way to find out what was going on, and what caused it to hang on that particular statement? As I said, occasionally this happens and there does not seem to be a way to get diagnostic information about why it could not continue. It does not fail the action…it just gets stuck.
When you stay it gets stuck, is the IDE still responsive, ie can you click on stop or use the menus etc? If you close Automise a reopen it, reopen the project and look at the log, did the run complete? I’m wondering if the UI got stuck but the project kept running in the background thread.
One possible cause is an out of memory issue, especially if you are running the project in the IDE with Live Logging enabled. For long-running projects that do lots of logging we highly recommend using ATCMD.exe to run the projects - ATCMD has a much lower memory overhead since it doesn’t have a UI to keep up to date.
Next time it happens, run this tool. It will popup a window asking you to select a process (it should show automise in the list) - once you select the process, click ok - it might take a few seconds but it should pop up a window with a trace report. Save the report (there is a save trace button) and then email it (zipped) to support @ finalbuilder.com
The trace will tell us how much memory is being used and also where each thread is at.
I apologize - I should have mentioned that this is running through the command line. So there is no live logging or anything. And it is a batch job that is not running out of memory, as far as I can tell with Task Manager on the machine. When I kill the process, and then open it in Automise, I see the step that it was stuck on. The process does not always get stuck (it happens rarely), and when it DOES get stuck it is never on the same step twice. Given that it is a command-line job that is being run through Task Scheduler, is there anything I can do to capture the trace?
The task manager is not that useful for determining out of memory issues - it’s a 32bit application so the max it can use is 3GB, but even when it’s using less than that it can encounter out of memory issues due to how memory managers work in code when the memory is fragmented.
I’m not sure if the madtrace tool will be able to connect to the process if it’s running under another user, I will have to test that.
Another thing to check is if automise wrote a bugreport file - have a look in the %TEMP% folder for the user the task runs under, for a file named “Automise bugreport.txt” - that is written out when the application experiences an unhandled error.
If you are able to, please send the project file to support @ finalbuilder.com so we can take a look and see if we can find a way to reproduce this issue.
Unfortunately, sending the project file will have to be a last resort because I will have to get authorization to send it to you. It is our entire nightly batch process, and it contains data that I am not allowed to share, potentially. Also, it does not only happen with this process, and it happens very rarely anyway.
I did not find a file with the title “bugreport” anywhere on the local drive. I COULD log in as the user that is running the batch process, if that would help with running the madtrace tool when/if this happens again.
To clarify…the app does not error out. It just stays on a particular (usually minor) step and will not go to the next one. When I kill the process in Task Manager and open Automise to see what happened on the last run, I see a start date/time for the step it is on, but no end date/time. I will try to attach a screenshot of what I mean.
The log from that run - note the end date/time and the duration of the last step reached. This is what happens when it gets stuck and I end the process in Task Manager:
It turns out that our morning batch on Tuesday had the same thing happen. This time it was on a Send Email step. Just stayed on that step for two days until I stopped the process from the Task Scheduler. I still have no idea how to troubleshoot it.
I did not. For future reference…does it need to be run while the application is stuck, or can it be run after the process has been ended? I figure the former, but the problem is that the top priority at that point is to kill the process so we can re-run it if needed. In case I DO get to run it…I need to log in as the user who runs the process, and then run the tool while Automise seems stuck?
It needs to be run while the application is still running, while it is hung. This tool will give us a stack trace for each thread, as well as information about memory usage etc. That might show us where it is getting stuck.
I haven’t had a chance to test this yet but I would expect you need to login as the same user, or run madtraceprocess as administrator if running under another account.
So I finally remembered to run the tool when the Automise process got stuck…but whether I run as myself or the running user, I get the same message when trying to run the tool: No 32bit process found, which uses madExcept.
The last two times it has gotten stuck, it was on a MS SQL Server Execute action, where I had told it to ignore failure. I have it set to 3 retries, and it shows all 3 tries, but instead of moving on, or moving to a failure state, it just gets stuck.
I tested the madTraceProcess here on a server and found that when automise is running from the task scheduler, you have to run madTraceProcess as Administrator for it to be able to see the automise process. Are you able to do that (do you have administrator access)?
The fact that it is getting stuck in different places each time is rather odd. We have not had any other reports of this sort of issue with either Automise or FinalBuilder (thousands of users). I am wondering if it is a log file corruption issue. Try renaming the project’s log file (projectname.log5) - Automise will create a new log file next time it opens the project file.
It happened again this past weekend…the Thursday night batch never completed, and so the Friday, Saturday, Sunday and Monday batches did not run. I saw this reply, and I have just renamed the log files from some of the processes. The largest log file is about 56 GB!
Open Automise on the server and go to tools menu, options - search for logging, on the Log History tab, set the Log History count to a low value (2-4). If you are exporting the logs at the end of the run’s then you don’t need to keep a lot of history… which would just cause the log files to get huge (depending on project size, amount logged etc).
Unfortunately, I need to keep at least 5 days of logging for when we have a long weekend and I need to see what happened. I usually keep at least 7 days. If I export the log, how do I view the exported log? As an HTML file?
Actually, the logging issue is that we have some processes that run every 3 hours, and if they fail all weekend, I need to see what is going on…so I had the logging set to keep the last 25 runs available. Is there a way to set the logging limit for each project, instead of for every project? Or to do it for a time, rather than a number of runs in the log history?
I stopped the process - there was nothing showing in the log - and then restarted it. Again, it hung up, and again, it seems like the same issue is occurring. Second log file: madTraceProcess2.mbr.txt (36.1 KB)
Another thought - I just deleted the old logs that I had renamed. So, instead of the drive having 55 GB free and 200+ GB used, it now has 54 GB used and 200+ GB free. Could this be an issue - the percentage of free space on the drive - even if we have well over 50 GB free? Just putting it out there.
The madtraceprocess logs are useful… it looks to me like a deadlock
Given that we have had no other reports of this issue (in Automise or FinalBuilder) it’s going to be very difficult to reproduce.
Are you able to send us (email support @ finalbuilder.com) your project file (not the logfile)? - there may be something in your project structure that is the trigger we need to reproduce it. Feel free to de-identify as needed, we won’t be able to run it anyway but will use it to create the same structure.
In the mean time I’m looking into the threading library we use in the stepping engine to see if there are any changes that might be relevant.
It’s a huge project - lots of action lists - but I can email it to you. The project seems to be hung up again - I will send you the madtraceprocess file from this one as well.
EDIT: I have just emailed the file. I will check back in an hour or so to see if there is any response. Our batch processes are dead in the water until we figure this out.