Yearly Archives: 2024

NI CVI/LabVIEW and monitors with different scaling

There is a known bug in NI’s CVI (and apparently LabVIEW) products that makes the user interface misbehave when the PC is connected to multiple monitors that have different scale settings: https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z0000004AZ9SAM

The “solution” mentioned in the article is to have the same scaling factor for all monitors or disable the scaling of the specific applications. That might have been (almost) reasonable in 2016, when this issue first came up and different scaling was probably just a misconfiguration, but in times where an external monitor might be high-DPI (and thus needs a high zoom factor) and the other one is e.g. a normal laptop display, this is not possible. Furthermore, if it only affected the development computers, ok, but to tell all my users(!) to “change your Windows settings for my app to work correctly” is downright embarrassing and reflects badly on me as a developer.

Thus, I filed a bug report with NI, and after apparently much deliberation it was returned as WONTFIX, as there are the workarounds mentioned above. Let me repeat: embarrassing.

So I once again got out my trusty disassembler and investigated this issue myself. Spoiler alert: it’s easy to fix, even without patching the runtime code.

The basic problem is that the CVI runtime employs a window, hidden away at coordinate 25000×25000, for some of its tasks. This has often be a source of trouble because its creation happens before my main() function is invoked, which can have many undesired side-effects I won’t go into at this point. Anyway, this hidden window seems to adapt the scaling factor of the “nearest” monitor and as long as the app is on that monitor, everything is fine, but if it’s on a different one, hilarity ensues.

My home-office monitor configuration with one monitor deliberately set to 125% zoom

The problem now is that, when a menu is created, the hidden window is given as its parent instead of the window the menu actually belongs to 🤷‍♂️! The menu then adopts the scaling of the hidden window, no matter which monitor it is actually on, and things start to break. This should be trivial to fix in the runtime, but as I don’t feel like binary patching and distributing a hacked runtime, maybe we can do better.

I found out that, when I hide the window (by actually marking it “not visible”, not just by moving it out of the way), Windows doesn’t adopt the scale of the hidden window but takes the scale of the monitor the new window actually appears on! Bingo, that’s what we want! The only caveat: CVI uses the hidden window for the taskbar button, so we lose that. Uh-oh, not good! But fear not, there is an easy way to make the actual (visible) main window get a button instead, so the solution in the end is fairly easy with (so far) no negative side effects:

#include <windows.h>
[...]

// Workaround for multi-monitor DPI scaling issues with CVI (mainly menus not appearing where they should be):
// CVI apps have a hidden main window at position 25000x25000, so always out of view and it uses this window
// as the parent for any menu that is shown. Unfortunately, the hidden window adopts the DPI scaling of the 
// nearest monitor and inherits it to the new window. That's fine if the new window is on that monitor, but if
// it is shown on a monitor with different DPI scaling then nothing fits anymore, it's drawn too big/small and
// at the wrong location.
//
// By just removing the WS_VISIBLE style of the (anyway hidden) window Windows changes its behaviour
// completely and the new window adopts the DPI scaling of the monitor where it is actually positioned.
// 
// Caveat: the hidden window is used for the taskbar button. If we hide that, we get the proper DPI behaviour, 
// but no taskbar button! So, as a further workaround, we enable the ES_EX_APPWINDOW style on the ACTUAL main
// panel below, which will then get its own taskbar button, with an even better working preview!
SetSystemAttribute(ATTR_TASKBAR_BUTTON_VISIBLE, 0);

[...]

if ((panelHandle = LoadPanel(0, "demo.uir", PANEL)) < 0)
	return -1;
	
// Second part of the DPI fix above, let's give our main window a taskbar button after all
HWND hwnd;
GetPanelAttribute(panelHandle, ATTR_SYSTEM_WINDOW_HANDLE, (intptr_t*)&hwnd);
SetWindowLongPtr(hwnd, GWL_EXSTYLE, GetWindowLongPtr(hwnd, GWL_EXSTYLE) | WS_EX_APPWINDOW);

As you can see, the workaround consists of only 3 lines of code. After including those, everything works as expected, menus draw consistently at the size and position they are supposed to show up, no matter the monitor configuration or monitor they are on.

All in all it took me less than a day to figure all this out, it was almost less work than filing and seeing through the bug report mentioned above (my support contact was very dedicated and determined, but in the end as helpless as me when R&D says they don’t want to fix it). NI on the other hand has listed this as a known issue for 8 years, and they have the source code! But I’ve heard these are very tumultuous times for them after the acquisition, I wish everybody the best and hope that someday they can correct course and maybe return to their old form. Long live CVI 😉

SetUnhandledExceptionFilter not working in 64-bit CVI code

At work I develop an application that is employed in production plants all around the world. It can host a lot of different code modules from different developers, so the code it executes is pretty diverse. To be able to quickly react to any problems I implemented a crash-handler that makes a memory image of the whole process and sends me an eMail. This has been working fine for many years (13 to be exact according to git) but when compiled for 64-bit it stupidly refused to work.

This puzzled me for a long time, the handler set through the SetUnhandledExceptionFilter API was simply never called. I eventually found a workaround by using the AddVectoredContinueHandler function, but not knowing what was going on always bugs me to no end. And also it’s just halve the story as we’ll see.

Eventually I found the problem and it turned out to be a subtle bug in the CVI C compiler when compiling for a 64-bit target. I think the clue was the behavior of a low level debugger when stepping through such a defective program. Here, at the start of main(), the stack trace looks fine:

But once I step over the LEA instruction, things get messy:

Why is that? The 64-bit Windows ABI (application binary interface) requires the start (prolog) and end (epilog) of functions to be written in a certain way and also that every function that is not a leaf function (a function that doesn’t call other functions) must have accompanying static data that tells Windows how to clean up the function. The debugger also uses this information to walk the stack, so something must be fishy there. This so called “unwind” data can be dumped using the “dumpbin” tool from the Visual Studio Tools:

> dumpbin /unwindinfo seh_bug.exe 
Microsoft (R) COFF/PE Dumper Version 14.29.30152.0 
Copyright (C) Microsoft Corporation.  All rights reserved. 
 
 
Dump of file seh_bug.exe 
 
File Type: EXECUTABLE IMAGE 
 
Function Table (77) 
 
           Begin    End      Info      Function Name 
[...]
0000000C 00001040 0000108E 00004010  main 
    Unwind version: 1 
    Unwind flags: EHANDLER UHANDLER 
    Size of prologue: 0x0A 
    Count of codes: 3 
    Frame register: rbp 
    Frame offset: 0x20 
    Unwind codes: 
      0A: SET_FPREG, register=rbp, offset=0x20 
      05: ALLOC_SMALL, size=0x48 
      01: PUSH_NONVOL, register=rbp 
    Handler: 00002634 __cvi_exception_handler

Spotted the problem? Look again, I’ll wait…

The instruction “lea rbp,[rsp+28h]” uses an offset of 28h, the unwind information for this piece of code however only specifies 20h!? A compiler bug that we can simply patch after the fact, right? Unfortunately not, the problem is that the offset in the unwind information is stored as a multiple of 10h, so a value of “28h” cannot be represented at all!

Now, I have opened a bug report with NI, the vendor of the compiler, but seeing how they have been treating it the last few years I’m not even sure they still have the people to fix this. In any case, I need a fix now, so what to do? CVI has had support for external compilers for a long time, maybe just switch to an external one for release builds?

This has a certain charm and CVI already comes with clang v3 (yeah, really) for this purpose. But the problem is that some of the libraries CVI comes with and that I use are already pre-compiled with the buggy compiler and there is no source code for them. So this fixes only halve the problem.

Next idea is to patch the binary in a post-build process. Question remains, how? Patching the code itself to have an even offset is quite dangerous as it’s difficult to guarantee that the patched code is right in every situation (like functions with multiple epilogs or whatnot). So we should patch the unwind info somehow and leave the code alone, but as already established, it cannot represent the value “28h” and we also cannot add any additional instructions to it as enlarging data structures comes with its own perils.

Simply removing the SET_FPREG instruction from the unwind information is an option and it works, but only as long as there are no dynamic stack allocations, like with alloca() or C99’s dynamic array sizes. Often this is not the case, but maybe we can do better.

The solution I have chosen in the end is to adjust the static allocation (ALLOC_SMALL in the example) by 8 bytes. This cancels out the error introduced by the SET_FPREG instruction and works fine in all cases except between the “sub rsp, 30h” and “lea rbp, [rsp+28h]”, but that is a negligible problem. The important thing is that the rest of the time stack walking in the debugger and when exceptions are handled, works again. By the way, I expected that about halve the functions are affected, but in my application it’s more like 70%:

Fixing unwind information...
Fixed 6199 out of 8849 functions

TL/DNR: The SetUnhandledExceptionFilter API can only do its work if all the exception information in the executable is correct, as it is invoked in the BaseThread function in kernel32. If the stack walking does not reach that function, nothing will happen, the process will just be killed. Check the exception unwind information for problems.