Monthly Archives: March 2024

SetUnhandledExceptionFilter not working in 64-bit CVI code

At work I develop an application that is employed in production plants all around the world. It can host a lot of different code modules from different developers, so the code it executes is pretty diverse. To be able to quickly react to any problems I implemented a crash-handler that makes a memory image of the whole process and sends me an eMail. This has been working fine for many years (13 to be exact according to git) but when compiled for 64-bit it stupidly refused to work.

This puzzled me for a long time, the handler set through the SetUnhandledExceptionFilter API was simply never called. I eventually found a workaround by using the AddVectoredContinueHandler function, but not knowing what was going on always bugs me to no end. And also it’s just halve the story as we’ll see.

Eventually I found the problem and it turned out to be a subtle bug in the CVI C compiler when compiling for a 64-bit target. I think the clue was the behavior of a low level debugger when stepping through such a defective program. Here, at the start of main(), the stack trace looks fine:

But once I step over the LEA instruction, things get messy:

Why is that? The 64-bit Windows ABI (application binary interface) requires the start (prolog) and end (epilog) of functions to be written in a certain way and also that every function that is not a leaf function (a function that doesn’t call other functions) must have accompanying static data that tells Windows how to clean up the function. The debugger also uses this information to walk the stack, so something must be fishy there. This so called “unwind” data can be dumped using the “dumpbin” tool from the Visual Studio Tools:

> dumpbin /unwindinfo seh_bug.exe 
Microsoft (R) COFF/PE Dumper Version 14.29.30152.0 
Copyright (C) Microsoft Corporation.  All rights reserved. 
Dump of file seh_bug.exe 
Function Table (77) 
           Begin    End      Info      Function Name 
0000000C 00001040 0000108E 00004010  main 
    Unwind version: 1 
    Unwind flags: EHANDLER UHANDLER 
    Size of prologue: 0x0A 
    Count of codes: 3 
    Frame register: rbp 
    Frame offset: 0x20 
    Unwind codes: 
      0A: SET_FPREG, register=rbp, offset=0x20 
      05: ALLOC_SMALL, size=0x48 
      01: PUSH_NONVOL, register=rbp 
    Handler: 00002634 __cvi_exception_handler

Spotted the problem? Look again, I’ll wait…

The instruction “lea rbp,[rsp+28h]” uses an offset of 28h, the unwind information for this piece of code however only specifies 20h!? A compiler bug that we can simply patch after the fact, right? Unfortunately not, the problem is that the offset in the unwind information is stored as a multiple of 10h, so a value of “28h” cannot be represented at all!

Now, I have opened a bug report with NI, the vendor of the compiler, but seeing how they have been treating it the last few years I’m not even sure they still have the people to fix this. In any case, I need a fix now, so what to do? CVI has had support for external compilers for a long time, maybe just switch to an external one for release builds?

This has a certain charm and CVI already comes with clang v3 (yeah, really) for this purpose. But the problem is that some of the libraries CVI comes with and that I use are already pre-compiled with the buggy compiler and there is no source code for them. So this fixes only halve the problem.

Next idea is to patch the binary in a post-build process. Question remains, how? Patching the code itself to have an even offset is quite dangerous as it’s difficult to guarantee that the patched code is right in every situation (like functions with multiple epilogs or whatnot). So we should patch the unwind info somehow and leave the code alone, but as already established, it cannot represent the value “28h” and we also cannot add any additional instructions to it as enlarging data structures comes with its own perils.

Simply removing the SET_FPREG instruction from the unwind information is an option and it works, but only as long as there are no dynamic stack allocations, like with alloca() or C99’s dynamic array sizes. Often this is not the case, but maybe we can do better.

The solution I have chosen in the end is to adjust the static allocation (ALLOC_SMALL in the example) by 8 bytes. This cancels out the error introduced by the SET_FPREG instruction and works fine in all cases except between the “sub rsp, 30h” and “lea rbp, [rsp+28h]”, but that is a negligible problem. The important thing is that the rest of the time stack walking in the debugger and when exceptions are handled, works again. By the way, I expected that about halve the functions are affected, but in my application it’s more like 70%:

Fixing unwind information...
Fixed 6199 out of 8849 functions

TL/DNR: The SetUnhandledExceptionFilter API can only do its work if all the exception information in the executable is correct, as it is invoked in the BaseThread function in kernel32. If the stack walking does not reach that function, nothing will happen, the process will just be killed. Check the exception unwind information for problems.