This will be a highly technical post, but as the last one got some positive feedback, why not 😉 Recently on the QL-users mailing list there was a bug discussed that if you load new Basic extensions using LRESPR within a basic PROCedure or FuNction SBasic would sometimes crash.
Wolfgang did some initial research and found the place where things eventually go wrong but wondered if it’s really worth the effort to try and fix it. And rationally speaking it’s absolutely not: the bug is relatively obscure and the SBasic interpreter code in SMSQ/E is so complex that it takes me huge efforts to barely understand parts of it. But being stubborn I went ahead regardless, starting with the findings Wolfgang already provided.
During the search I found references to bug fixes in the code I made 10 years ago about which I have no recollection whatsoever. The fixes are so deep in the innards of the beast that I apparently once understood what it does, but not anymore. So this time I want to record a few thoughts so maybe I don’t have to relearn everything from scratch next time.
Parser
The SBasic main loop starts at sbm_loop (main_asm). It provides the command line interface and it’s the place where I implemented the command line history. Entering a command using the keyboard or loading a program from disc is basically the same: the strings are read from the channel and fed into the parser stage (parse_asm). It does some basic error checking and translates the commands into numerical “parser tokens”. So 5 spaces for example become $8005 and “IF” becomes $8103. The tokens are defined in the file “parser_keys”.
Compiler
When the program is RUN the action starts at sb_execute (execute_asm). First it calls the compiler at sb_compile (compile_asm) which translates the “parser tokens” into “compiler tokens”. Compiler tokens are similar but not entirely like the parser tokens: “END” and “FOR”, for example, are two tokens in the “parser tokens” space, but when preparing for execution combined into a single “END FOR” token in the compiler token space. These tokens are found in the comp_keys file.
In the next stage the compiler tokens are compiled into operations (compop_asm). This creates the actual stream of operations that is later executed by the interpreter. Pretty much all control structures like GO TO/GO SUB/PROCedure, FuNctions, SELect, FOR/REPeat loops and even IF THEN ELSE are compiled into (conditional) GOTO jumps. At this stage these jumps are still addressed by line- and statement number.
From the operation stream a statement table is built (cmpstt_asm). This table translates line/statement numbers into absolute addresses within the stream.
During the next stage (cmpadd_asm) and using the just built statement table, each line/statement number in the operation stream is replaced by the absolute address within the stream.
Next a table of the DATA statement locations is created, which concludes the compile phase.
Interpreter
After compilation the interpreter (inter_asm) is invoked. It sets up the data structures and uses a jump table to jump to the actual code blocks that executes the operation tokens in a tight loop (sb_iloop). The code snippets that correspond to the operations have the prefix bo_, so the “+” operation for example is executed by bo_add. This is all done by a fairly complex macro, so if you search the source for bo_add you will only find the code itself but no location from where it is called.
Complex structures like PROCedure calls are split into many different operations, like bo_spcall to setup the call, bo_dospr to actually do it and a lot of operations to set up the parameters (bo_formp amongst others). bo_return handles RETurn statements and also things like END DEFines.
The bug
SBasic crashed in the bo_return code because it a) wanted to clean up the PROCedure parameters and b) wanted to return to the location following the PROCedure call in the operation stream. So what happened? The LRESPRed extension calls sb.inipr (inipr_asm) to link in its new commands. This just results in an extension of the name table, which in itself is not a problem and which made me really wonder where things go wrong. It took me a long time to see that these three little lines cause all the ruckus:
       tas    sb_edt(a6)              ; edited! to redo name types        sf     sb_cont(a6)             ; do not continue        move.w #sb.nact,sb_actn(a6)    ; but no action
The LRESPR function that is loading and executing the extension is itself called in sb_icall, which is called from bo_docpr. This checks the sb_cont flag and returns to sb_istop if not set anymore. As the name suggests this stops the interpretation of the program and returns all the way up to the main SBasic loop. The loop checks which action is to be done, which is sb.nact (no action) and then it COMPILES THE PROGRAM AGAIN! This was a bit surprising at first but if you think about it it’s clear that this needs to be done because we’ve just loaded an extension with new SBasic procedures and functions and the programmer might like to start using those now and they weren’t known when we last compiled the program! But now we’re in the middle of a procedure, how can we just continue here? Enter the “return stack”.
Return stack
As the name implies, the stack holds all data necessary to facilitate returns from PROCedures, FuNctions and GO SUBs. Mainly it holds the return addresses in the operation stream and more data pertaining to the parameters. These are absolute addresses… which is usually not a problem… unless we recompile the program during execution and it so happens that the whole instruction stream data block moves! 😮
First fix
What do we always do in SBasic if a block can move? Correct, we just make all pointers relative to the base. Actually understanding the code to a degree to find all those pointers is a wholly different matter, though. And in the end it was all in vain, this did fix most of the crashes but not all, because not only the block as a whole can move but also the operation boundaries within the block, so even relative addresses were not good enough. And as I’m not happy when something only works 50, 80 or even 99% of the time this meant going back to the drawing board.
Final fix
The final code is curiously both more complex and less invasive at the same time: before the program is re-compiled after the LRESPR the return stack entries are now translated from absolute addresses into Line/Statement numbers (using the previously mentioned statement table). The Line numbers are invariant to the compilation process, so after the compilation they are again translated into absolute addresses which always give the correct result, no matter how the compilation changed the operation stream. So the new code should have zero performance and stability implications for ordinary executions and is only active when you LRESPR things. And neither Wolfgang nor I managed to crash it anymore, yay!
The end
This, quite frankly, was a lot of work for such an obscure bug. Why did I do it? One reason is that I’ve been suffering under the bug for a decade without even realising it: when I was toying around with ProWesS I always loaded it conditionally using a PROCedure, which sometimes crashed and I could never explain why and just assumed ProWesS itself was the culprit… the other reason is that stress levels at my real job are currently very high and these puzzles are actually somewhat relaxing sometimes.
No matter the reason, SMSQ/E 3.31 is out and you can enjoy your crash free BOOTs and whatnot from now on. Have fun!