At work I develop an application that is employed in production plants all around the world. It can host a lot of different code modules from different developers, so the code it executes is pretty diverse. To be able to quickly react to any problems I implemented a crash-handler that makes a memory image of the whole process and sends me an eMail. This has been working fine for many years (13 to be exact according to git) but when compiled for 64-bit it stupidly refused to work.
This puzzled me for a long time, the handler set through the SetUnhandledExceptionFilter API was simply never called. I eventually found a workaround by using the AddVectoredContinueHandler function, but not knowing what was going on always bugs me to no end. And also it’s just halve the story as we’ll see.
Eventually I found the problem and it turned out to be a subtle bug in the CVI C compiler when compiling for a 64-bit target. I think the clue was the behavior of a low level debugger when stepping through such a defective program. Here, at the start of main(), the stack trace looks fine:
But once I step over the LEA instruction, things get messy:
Why is that? The 64-bit Windows ABI (application binary interface) requires the start (prolog) and end (epilog) of functions to be written in a certain way and also that every function that is not a leaf function (a function that doesn’t call other functions) must have accompanying static data that tells Windows how to clean up the function. The debugger also uses this information to walk the stack, so something must be fishy there. This so called “unwind” data can be dumped using the “dumpbin” tool from the Visual Studio Tools:
> dumpbin /unwindinfo seh_bug.exe Microsoft (R) COFF/PE Dumper Version 14.29.30152.0 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file seh_bug.exe File Type: EXECUTABLE IMAGE Function Table (77) Begin End Info Function Name [...] 0000000C 00001040 0000108E 00004010 main Unwind version: 1 Unwind flags: EHANDLER UHANDLER Size of prologue: 0x0A Count of codes: 3 Frame register: rbp Frame offset: 0x20 Unwind codes: 0A: SET_FPREG, register=rbp, offset=0x20 05: ALLOC_SMALL, size=0x48 01: PUSH_NONVOL, register=rbp Handler: 00002634 __cvi_exception_handler
Spotted the problem? Look again, I’ll wait…
The instruction “lea rbp,[rsp+28h]” uses an offset of 28h, the unwind information for this piece of code however only specifies 20h!? A compiler bug that we can simply patch after the fact, right? Unfortunately not, the problem is that the offset in the unwind information is stored as a multiple of 10h, so a value of “28h” cannot be represented at all!
Now, I have opened a bug report with NI, the vendor of the compiler, but seeing how they have been treating it the last few years I’m not even sure they still have the people to fix this. In any case, I need a fix now, so what to do? CVI has had support for external compilers for a long time, maybe just switch to an external one for release builds?
This has a certain charm and CVI already comes with clang v3 (yeah, really) for this purpose. But the problem is that some of the libraries CVI comes with and that I use are already pre-compiled with the buggy compiler and there is no source code for them. So this fixes only halve the problem.
Next idea is to patch the binary in a post-build process. Question remains, how? Patching the code itself to have an even offset is quite dangerous as it’s difficult to guarantee that the patched code is right in every situation (like functions with multiple epilogs or whatnot). So we should patch the unwind info somehow and leave the code alone, but as already established, it cannot represent the value “28h” and we also cannot add any additional instructions to it as enlarging data structures comes with its own perils.
Simply removing the SET_FPREG instruction from the unwind information is an option and it works, but only as long as there are no dynamic stack allocations, like with alloca() or C99’s dynamic array sizes. Often this is not the case, but maybe we can do better.
The solution I have chosen in the end is to adjust the static allocation (ALLOC_SMALL in the example) by 8 bytes. This cancels out the error introduced by the SET_FPREG instruction and works fine in all cases except between the “sub rsp, 30h” and “lea rbp, [rsp+28h]”, but that is a negligible problem. The important thing is that the rest of the time stack walking in the debugger and when exceptions are handled, works again. By the way, I expected that about halve the functions are affected, but in my application it’s more like 70%:
Fixing unwind information... Fixed 6199 out of 8849 functions
TL/DNR: The SetUnhandledExceptionFilter API can only do its work if all the exception information in the executable is correct, as it is invoked in the BaseThread function in kernel32. If the stack walking does not reach that function, nothing will happen, the process will just be killed. Check the exception unwind information for problems.