Debugging Minerva

By | 2018-02-27

Martyn Hill recently discovered a bug in later versions of Minerva, but couldn’t quite pin it down. I thought it might be nice to write up a little tutorial about how I try to tackle these kind of problems.

The most important thing of course is being able to reproduce the problem, which was the hard part here as I haven’t had any networked QLs for 20 years. I didn’t remember any of the commands and devices used and somehow one ULA died along the way, which didn’t exactly help matters. But in the end I got it going and yeah, I could get it to crash:

The “LBYTES” never returns here. The first thing to do in these cases is loading QMON and simply activating it

This way we will ideally break into the debugger if anything fishy happens. So, try the repro again and bingo:

Excuse the quality, at this point I didn’t know I will be writing about this, I made the picture just in case I need any data from it later in the process! Two things will look odd for people experienced in QMON, the “__000000” and the “MULS.L”. The first prefix “__” is an extension that displays addresses relative to the job base, which makes debugging jobs so much easier. In this case the JOB is SuperBasic, which has a start address of 0, so we’re actually at the absolute address 0 here. “MULS.L” is a 68020 instruction and now I can be glad I did this on a BBQL because this way the crash happened as early as possible. I think both the job-relative and the 68020 features were added to QMON by me some 15 years ago and were probably never released, but it doesn’t matter for this exercise.

The next step is to look at the stack, hence the “d2 (a7)”. There we see “00000D28”, which could potentially be a ROM address. In this case I have assembled Minerva myself so I have the corresponding _MAP file ready, which is invaluable when debugging assembler code:

This looks legit, so we open the source file “io_serio_asm” and also have a look into DISA to check which instruction exactly is there

Bingo, there is a jump instruction directly in front of it, so we’re at the right place. We look for it in the source code

It jumps to a4 and, referencing the picture above, a4 is 0. We can also see the contents of the saved registers on the stack “0000000F” for D4, “00000000” for d5, “000b6568” for a1 and again “00000000” for a4. The next long word is “00000d52” which again looks like an address, so where are we coming from? Again, looking it up in DISA and then searching the instruction sequence in the source we end up at the label “headr1”

Ah, so we were calling io_pend, let’s look at this code

Lau commented out the “moveq” because he thinks d0 is 0 here anyway. We have a look at “vectest”

where we see that D0 is used as an index but should be the same going out as going into the function. No we have a look at theย QMON screenshot and see that D0 is actually “00000058”! So the assumption that it would be “0” doesn’t hold and this explains our crash. We include the “moveq” in the source code, reassemble and burn the EPROM and yes, it doesn’t crash anymore. Happy days ๐Ÿ™‚

The whole process maybe took 10 minutes, it was actually a pretty straight forward. For sure it took a lot longer to write it up ๐Ÿ˜‰ I know Martin has been working at this for quite a bit, so I thought I’d share my method in the hopes that more people can find and fix these kind of bugs in the future.

For everybody who is not at home assembling Minerva I have added an English and German version with the fix here:

Minerva 1.98 with I/O fix (English/German)

2 thoughts on “Debugging Minerva

  1. Martyn Hill

    Dear Marcel, what can I say, except thank you!

    Your approach is SO much more efficient than what I was attempting ๐Ÿ™‚

    Reply

Leave a Reply to Norman Dunbar Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.