--- /dev/null
+ EMULATING THE APPLE HIGH SPEED SCSI CARD: AN EXERCISE IN DIGITAL ARCHAEOLOGY
+
+ by James Hammons
+
+ ~~==< Brought to you in Glorious 80-Column Monospace-o-Vision(TM) >==~~
+
+
+Motivations
+-----------
+
+While reading 4am's Twitter feed one day, he talked about his "Pitch Dark" hard
+drive image, which looked incredibly cool and like something that I would very
+much be interested in. But in reading about it, I came across a seemingly
+throwaway line about how all decent emulators can run them, which, sadly,
+Apple2 could not at the time. And so, in order to save Apple2 from indecency
+(and because I wanted to see if I could get 4am's "Pitch Dark" to work because
+it looked cool and interesting), I set about for finding some documentation on
+how hard drives interfaced to Apple IIs--and ran into a complete dearth of
+information. There were little things sprinkled around here and there, but
+nothing of any deep, satisfying, technical significance.
+
+
+In Order To Run A Hard Drive Image, You Must First Create The Universe
+----------------------------------------------------------------------
+
+While it's a nice bit of hyperbole, it's not exactly true that you have to
+first create the Universe, as fortunately, that part has largely been taken
+care of. However, you still have to figure out how to emulate it if you are
+keen on running a hard drive image on your emulator of choice. And in so
+doing, you have to figure what the requirements are; what the minimal pieces
+are that are required to have a functioning hard drive system; you also have to
+figure out how that system talks to the emulated computer. And that all
+requires information. I wasn't asking for much, but something along the lines
+of Jim Sather's "Understanding The Apple IIe" for hard drives would have been a
+nice thing to have.
+
+
+The Next Part, In Which Nice Things To Have Are Not Forthcoming
+---------------------------------------------------------------
+
+Unfortunately, Jim Sather, and nobody else as far as I can tell, ever wrote
+such a document, and so I did what any lazy programmer would do: I took a look
+at some other project's source--in this case, AppleWin's source. I didn't
+really *want* to look at it, having looked at it before and recoiled in horror
+at the sight, but, my search-fu apparently being not up to the task of finding
+relevant information drove me to it. And looking at it didn't really provide
+any illumination; to me it looked like some kind of hacky thing and I wasn't
+interested in that kind of approach at all--so I abandoned the idea. As I dug
+a little deeper into the minute literature that existed as such on the subject,
+I learned that pretty much any time you wanted to hook up a hard drive to your
+Apple II, you had to use an interface card, and typically that meant some kind
+of SCSI card. And looking here, there was no shortage of SCSI cards that you
+could use to hook up your hard drive therewith.
+
+So, that being a promising looking path to pursue on the road to this
+particular perdition, the question then became, which one should I choose? At
+first I thought the RAMFast card would fit the bill as it seemed to be very
+popular, but there was literally no technical infomation on the thing. The
+Apple SCSI card looked promising, but then I saw that it "ghosted" a slot,
+meaning that it would have to occupy two consecutive slots in order to work and
+I didn't much care for that. And so, after looking at, and rejecting, card
+after card for pretty much the same reason, I settled on the Apple High Speed
+SCSI card for a few reasons--one, it was purportedly fast; two, it worked on
+the Apple IIe (as well as the IIgs, but I didn't really care that much about
+that to be honest); three, it had a users manual that wasn't completely devoid
+of technical information; four, it had a schematic; and five, it had a firmware
+image. This looked like a promising start--how hard could it be to make this
+work?
+
+
+Things Aren't Exactly Hard, But They Aren't Exactly Soft Either
+---------------------------------------------------------------
+
+One of the necessary things that I didn't have out of all of that was good
+information on how the thing worked. I knew that it was a SCSI card, and I
+knew that it talked to the SCSI bus using an NCR 53C80 chip, but I had no idea
+exactly how. But I did have something that *did* know how to talk to it: the
+firmware for the card.
+
+Now when you take a look at the firmware, the first thing you notice is that
+it's 32K in size--which is *much* larger than the typical 256 bytes that you
+encounter when looking at Apple II card drivers. It also happens to be quite a
+bit larger than the 2K "bonus" space that Apple II cards have available to them
+in the $C800 to $CFFF address space. So what gives?
+
+Fortunately for me, Apple2 has a built-in disassembler (which will probably
+stay in for all time, as it turns out to be a very useful thing to have on
+hand), and so I split that out into a stand-alone command line driven program,
+called d65c02, in order to be able to disassemble such things as device driver
+firmware blobs. It isn't fancy, it doesn't do any analysis on what is code and
+what is data, but it gets the job done in turning incomprehensible binary
+gibberish (except to certain mad geniuses who will go heretofore unnamed) into
+human readable ASCII gibberish. Thus I used said tool to disassemble the
+firmware blob.
+
+Pulling up the results in my text editor, I could see that at least the front
+of the listing looked like it could plausibly be code that would go into the
+usual 256 byte card slot address space of $Cx00 to $CxFF, where x ranges from 1
+to 7 depending on the slot number. Looking further, I could see this first 256
+bytes of code was repeated three times, meaning that this was a good candidate
+for the slot device code. I could also see that it was written as relocatable
+code, and it contained this little tidbit:
+
+001B: A9 60 LDA #$60 ; Stuff an RTS into RAM somewhere
+001D: 8D F8 07 STA $07F8
+0020: 20 F8 07 JSR $07F8 ; Jump there and return in order to get evidence
+ ; of where in memory we did it from
+0023: BA TSX ; Retrieve the stack pointer
+0024: BD 00 01 LDA $0100,X ; Get the hi byte of the address we just pushed on
+ ; the stack in order to come back here
+0027: 8D F8 07 STA $07F8 ; & save it for later perusal
+
+which meant that it was an excellent candidate for the slot device code. But
+why should that be?
+
+
+A Short Digression Into Why Slot Code Must Be Relocatable
+---------------------------------------------------------
+
+Slot code must be relocatable because such a card may be installed into any
+given slot in an Apple II--which means its code will show up anywhere from
+$C100 to $C700 (it always shows up on a page boundary). By virtue of this, it
+also means that the I/O address for the card will also show up in the
+corresponding $C090 to $C0F0 address range (it always shows up on a 16-byte
+boundary). And so, because of this, you have to write your slot code in such a
+way that it will work regardless of which slot it's installed in, which means
+the code must be relocatable--which ultimately means you can't use any JMP
+instructions to addresses in your driver, and you can't use absolute addressing
+to refer to stuff in the slot address space.
+
+So, using the above code, a clever coder can figure out what slot their code is
+executing in and they can then use that knowledge to figure out which is the
+proper I/O range to use for the card. All this being necessary in order to
+make a seamless experience for the end user of the card.
+
+
+The Next Part, In Which 32K Is Still Larger Than 256
+----------------------------------------------------
+
+So, in looking at the code that comes after the Code Which Looks Like It
+Belongs In Slot Memory (which makes the wonderful acronym CWLLIBISM), I noticed
+that it seemed to be organized in 1K chunks. And further persual of said
+chunks made it seem very likely that they resided in the $CC00 to $CFFF memory
+space. However, the "extra" memory space given to cards to use starts 1K
+earlier--at $C800. What could this mean?
+
+Well, in looking at the schematic for the card, one not only finds the 32K ROM
+chip, but also an 8K static RAM. Which means that it's very likely that the
+address space from $C800 to $CBFF is mapped to that 8K static RAM. But 8K is
+larger than 1K; how does that work?
+
+As it turns out, it's bank switched, but I didn't know it at the time--we'll
+get to that eventually. In the meantime, with further perusal of the code (the
+code gets perused quite a bit), it seems very likely that the 1K address range
+from $C800 to $CBFF is said RAM as that range is written to by the 1K code
+chunks quite frequently.
+
+Finding that the code in the firmware is divvied up into 1K chunks would seem
+to imply that it's bank switched into the $CC00 to $CFFF range. And in looking
+at the CWLLIBISM, we see the following:
+
+005C: A9 0B LDA #$0B ; Get 11 in the accumulator
+005E: AE 08 C8 LDX $C808 ; Get offset to proper I/O space in X
+0061: 5A PHY ; Save Y on the stack for later
+0062: A8 TAY ; Copy the accumulator to Y
+0063: 29 1F AND #$1F ; Strip off the upper three bits
+0065: 9D 6E C0 STA $C06E,X ; & write to card I/O location $E
+
+which implies it heavily. Taking the number put into the accumulator and then
+masking out the lower 5 bits creates a range that goes from 0 to 31, which is
+32 distinct values, which corresponds to 32 1K chunks of code.
+
+The above code, which is part of the initialization of the card, heavily
+implies that it's selecting a 1K chunk of code from bank 11 (counting from
+zero, naturally) to put into the $CC00 to $CFFF address range. And so we get
+to(*) look there for a start.
+
+(*) While changing 'have to' to 'get to' can make life awesome in many ways,
+this is far from a universal truth. 'Getting to' have one's arm amputated is
+never, ever awesome
+
+
+The Next Part, In Which We Sadly Bid Adeiu To CWLLIBISM
+-------------------------------------------------------
+
+But before we do that, in order to understand what's going on in those wicked
+little 1K chunks of code, we should first take a closer look at CWLLIBISM. So
+let's jump in:
+
+0000: A2 20 LDX #$20 ; The bytes after the LDX # identify this card as
+0002: A2 00 LDX #$00 ; being capable of SmartPort calls, and the $82 at
+0004: A2 03 LDX #$03 ; $FB further identifies it as a SCSI card ($2)
+0006: A2 00 LDX #$00 ; that supports extended calls ($8).
+
+The way that I was able to find out that this seemingly useless bit of code was
+a way of identifying SmartPort capable cards was in the serendipitous find of
+the "Technical Manual for the Apple SCSI Card"(*), which, while helpful in some
+ways, was almost completely useless in trying to figure out the what the card
+I/O addresses did.
+
+(*) No relation to the Apple High Speed SCSI Card
+
+0008: 2C 58 FF BIT $FF58 ; Check byte in ROM (usually, an RTS lives here)
+000B: 70 05 BVS $0012 ; Bit 6 set? >> $12 (which means, this branch
+ ; will be taken...)
+
+This little tidbit checks a ROM location that usually carries an RTS (at least
+it does in the Apple IIe), which is $60. Which means that the following BVS
+will always be taken and skip over the following:
+
+000D: 38 SEC ; ProDOS entry point
+000E: B0 01 BCS $0011 ; Branch over the following CLC
+0010: 18 CLC ; SmartPort DISPATCH
+0011: B8 CLV ; Signal we're doing normal I/O, not init code
+
+So this clever little bit here, according to the "Technical Manual for the
+Apple SCSI Card", sets some flags so that later on in the firmware, it can
+discern whether it's being called from ProDOS (in which the carry flag will be
+set) or if it's a SmartPort call (in which the carry flag will be clear).
+Either way, the overflow flag is cleared to let the firmware know that this is
+a request to talk to the drive, and not initialization. Initialization skips
+over this code and ends up here:
+
+0012: D8 CLD ; Clear the decimal flag, to prevent bad math
+0013: 08 PHP ; Save the carry & overflow flags for later
+0014: 78 SEI ; Turn IRQs off
+0015: AD FF CF LDA $CFFF ; Turn INTC8ROM off (puts card in $C800-CFFF)
+0018: 8D 00 CC STA $CC00 ; ???
+
+This bit of code is a bit of housekeeping; making sure the decimal flag isn't
+set so that ADC & SBC both work as expected, saving the flags register so that
+the firmware code later can determine whether it's an initialization call or a
+regular I/O call, making sure that IRQs don't happen while in the firmware
+code, and turning on the "extra" addresses in the $C800 to $CFFF range.
+
+The store to $CC00 is mysterious, as it's a ROM location and stores to ROM
+locations are usually void and of null effect. This likely means that it's
+some kind of soft-switch that controls something in card, but exactly what
+would require a few things that I don't have, namely: the contents of the two
+PALs on the card (which sit between the address lines of the slot and the rest
+of the card), and a description of what the ports on the Sandwich II do (the
+chip that sits between the Apple IIe proper and the NCR 53C80). So, moving
+right along:
+
+001B: A9 60 LDA #$60 ; See where we're executing from
+001D: 8D F8 07 STA $07F8
+0020: 20 F8 07 JSR $07F8
+0023: BA TSX
+0024: BD 00 01 LDA $0100,X ; Get the address we just pushed on the stack
+0027: 8D F8 07 STA $07F8 ; Save it
+
+We've seen this already, this is the code that determines which slot it's
+sitting in. Say, for example, that it's sitting in slot 7; the byte that it
+will retrieve from the stack will be $C7 (for the sake of completeness, the lo
+byte will be $22--as to why, this is left as an exercise for the reader). In
+order to turn that into something that it can use to hit the proper slot I/O
+addresses, it does the following:
+
+002A: 29 0F AND #$0F ; Get the lo nybble
+002C: 0A ASL A ; Multiply it x16
+002D: 0A ASL A
+002E: 0A ASL A
+002F: 0A ASL A
+0030: 18 CLC
+0031: 69 20 ADC #$20 ; Add $20 to it for some reason
+0033: AA TAX ; & stick in the X register
+
+The important part of the $C7 hi byte of the address we found through
+cleverness and trickery is the slot number, which will always fall in the lower
+4 bits. And, in order to be useful to find the correct slot I/O address range,
+that slot number needs to be multiplied by 16, as each of the slot I/O address
+ranges cover exactly sixteen bytes. Note that masking off the bottom 4 bits,
+as is done with the AND #$0F instruction, is unnecessary as the four ASL A
+instructions after it will necessarily shift the top four bits out of the
+picture.
+
+The one thing that stands out as not typical of this kind of device driver code
+is the adding of $20 to the index. Typically, writers of this kind of I/O code
+will use $C080 to $C08F (plus the contents of the X register to reach the
+correct slot I/O range) as the base address for slot I/O, but, for some reason,
+the writers of this card's firmware chose to use $C060 to $C06F, thus
+necessitating the addition of $20 to the value in the X register to reach the
+correct range for slot I/O.
+
+0034: A9 00 LDA #$00 ;
+0036: 9D 6E C0 STA $C06E,X ; Select bank #0 (register $E, lower 5 bits)
+0039: A9 0F LDA #$0F
+003B: 9D 6F C0 STA $C06F,X ; Store a $F in register $F
+003E: 8E 08 C8 STX $C808 ; Put slot # at $C808 (banked RAM in $C800-CBFF)
+0041: 9C 09 C8 STZ $C809 ; Put zero at $C809
+0044: 9C F2 C8 STZ $C8F2 ; & $C8F2
+
+One thing I forgot to mention is that the Apple High Speed SCSI card is only
+usable by enhanced Apple IIe and IIgs machines, and that's because it relies on
+instructions only found in the 65C02 like STZ and PHY; a regular 6502 will not
+even remotely do the same things that those instructions do on the 65C02--so
+they're right out.
+
+At any rate, the above code does some writing to the slot I/O address range and
+sets up some values in the card's static RAM, including saving the contents of
+the X register for later.
+
+0047: A2 22 LDX #$22 ; Transfer 35 bytes from ZP ($40) to $C82D
+0049: B5 40 LDA $40,X
+004B: 9D 2D C8 STA $C82D,X
+004E: CA DEX
+004F: 10 F8 BPL $0049
+
+This bit of code transfers 35 bytes in page zero RAM to the card's static RAM,
+presumably to restore them later.
+
+0051: AD F8 07 LDA $07F8 ; Get original $Cx byte again
+0054: 8D 01 C8 STA $C801 ; Put it in $C801
+0057: A9 61 LDA #$61 ;
+0059: 8D 00 C8 STA $C800 ; Put $61 in $C800 (= $Cx61)
+005C: A9 0B LDA #$0B
+005E: AE 08 C8 LDX $C808 ; Get X from $C808
+
+This little bit of code sets up for the code that comes below; it sets up
+locations $C800-1 as a location for an indirect jump that seems to happen a lot
+in the 1K chunks that come later. The address it sets up as the jump target is
+the code that comes next:
+
+0061: 5A PHY ; Save Y (follow on bank, passed in by caller)
+0062: A8 TAY ; Save A register
+0063: 29 1F AND #$1F ; Mask off the lower 5 bits
+0065: 9D 6E C0 STA $C06E,X ; First time, select bank 11:0 (I/O register $E)
+0068: 98 TYA ; Restore the A register
+0069: 29 E0 AND #$E0 ; Mask off the upper 3 bits
+006B: 4A LSR A ; & shift them down
+006C: 4A LSR A
+006D: 4A LSR A
+006E: 4A LSR A
+006F: A8 TAY ; Use as an index into a table (Y x 2)
+
+What this does is save the Y register on the stack, then separates the
+accumulator into a upper 3-bit part and a lower 5-bit part. The lower 5 bits
+go into I/O slot register $E, which presumably selects which 1K chunk of code
+will appear in the $CC00 to $CFFF address range while the upper 3 bits are used
+as an index into a table that appears near the end of each 1K chunk:
+
+0070: B9 F0 CF LDA $CFF0,Y ; Get address of current 1K bank
+0073: 85 54 STA $54 ; & stuff it into $54/55
+0075: B9 F1 CF LDA $CFF1,Y
+0078: 85 55 STA $55
+
+So it uses the Y register as index into the current selected bank's $CFF0
+address range and stuffs them into $54 and $55, so that it can jump to the
+address at some point.
+
+007A: AD F8 07 LDA $07F8 ; Get original $Cx byte again
+007D: A8 TAY ; Put it in Y
+007E: 48 PHA ; Put it to the stack
+007F: A9 86 LDA #$86
+0081: 48 PHA ; Push $86: return address is now $Cx87
+
+What this does is set up the stack for what I'm going to name (for lack of a
+better term, or any at all to be honest) an "RTS call". This takes advantage
+of how the CPU uses the stack to return execution to the instruction after a
+JSR instruction: when the CPU encounters a JSR opcode, it pushes the the
+location of the program counter, plus two, onto the stack before loading the
+program counter with the address that comes after the JSR. When an RTS opcode
+is then encountered, it restores the program counter from the stack and adds
+one to it before resuming execution.
+
+The upshot of this is that you can transfer execution of a program from one
+place to the next, without using JMP, JSR or branch instructions by simulating
+this behavior--which also turns out to be a necessity when you're writing
+relocatable code. So what the above code does is set up the stack so that it
+will jump to location $Cx87 when it encounters an RTS.
+
+0082: 5A PHY ; Push $Cx
+0083: A9 8B LDA #$8B ; Push $8B: return address is now $Cx8C
+0085: 48 PHA
+
+Similarly, this code sets up the stack so it will jump to $Cx8C when it
+encounters an RTS as well. So it will go there first, then to $Cx87 second
+when the routine first called via RTS call, er, uh, returns.
+
+0086: 60 RTS ; First time, will "return" to $Cx8C
+
+Thus, this first RTS transfers control to the JMP ($0054) down below, which was
+set up above as an address somewhere in a 1K code chunk. Since the code that
+goes into the 1K code chunk is a JMP instruction, once that code returns, it
+will then find the address that was pushed on the stack earlier, and execute
+the following code:
+
+0087: 68 PLA ; After the $CCxx block is done, it comes here
+0088: 9D 6E C0 STA $C06E,X ; Restore last block (one passed in Y reg)
+008B: 60 RTS ; & return to calling code in that block
+
+This code pops the Y register that was saved way back up at location $Cx61 and
+uses it to set the I/O register at $E, which, presumably, is the bank switch
+I/O address for the card. This will turn out to be of vital importance later,
+but we'll leave it for now. The RTS, finally, returns from initialization and
+back from whence it came.
+
+008C: 6C 54 00 JMP ($0054) ; Jump to the $CCxx block code
+
+This indirect JMP instruction, called up above via RTS call, kicks things off.
+
+008F-00FA: 00 ; $6B worth of zeroes
+00FB: 82 00 00 BF 0D ; ID/offset bytes
+
+So these bytes that look like a bit of detritus actually do serve a useful
+function in ProDOS. The $0D at the very end serves as an offset from the
+beginning of the code to the ProDOS entry point, which in this case works out
+to $Cx0D. It also serves as the entry point for SmartPort calls (by adding 3
+to it), which works out to $Cx10.
+
+Further, the "Technical Manual for the Apple SCSI Card" says the following
+about the byte at $FB: "An additional byte, at $CnFB, should contain $82,
+indicating that the device is the SCSI card ($2) and that it supports extended
+calls ($8)." This just happens to be one of a small handful of those
+aforementioned tiny bits of useful information that I was able to glean from
+that source.
+
+And so, at last, we come to the realization that this is definitely the slot
+ROM code, and thus CWLLIBISM becomes CWSISM (Code Which Sits In Slot Memory).
+
+
+And Now For Something Not Quite So Completely Different
+-------------------------------------------------------
+
+And with that digression into CWSISM, we turn our attention back to the 1K
+chunk of initialization code that sits in bank 11. In looking at the table
+that we discovered sits at $CFF0, we find the following in the 11th (counting
+from zero) 1K chunk:
+
+CFF0: 00 CC
+CFF2: 91 CE
+CFF4: 9A CD
+CFF6: 00 00 00 00 00 00 00 00 00 00
+
+This tells us that there are only three valid addresses in the table (as the
+zeroes will take you nowhere), and that further, they are $CC00, $CE91 and
+$CD9A. And since the CWSISM set up the $Cx61 dispatch call with $0B (at
+$Cx5C), it will pick the zeroeth address in that list, namely, $CC00. So,
+looking at the code that lies there, what we see looks promising:
+
+CC00: 68 PLA ; Discard the 2nd return path (bank switch back)
+CC01: 68 PLA
+CC02: 68 PLA ; Discard the follow on bank #, as there is none
+
+Since this is initialization code, we can discard the RTS call from the stack
+since we aren't calling this code from another bank. Which also means that we
+can discard that parameter which tells the RTS call what bank to select before
+returning.
+
+CC03: 86 5E STX $5E ; Save slot # (+$20) in $5E
+CC05: 9C 93 C8 STZ $C893 ; Zero out $C893 & $5D
+CC08: 64 5D STZ $5D
+CC0A: 20 C1 CC JSR $CCC1 ; Test for GS hardware + DMA switch
+
+This is basically housekeeping, and the routine called at $CCC1 tests if the
+card is running on an Apple IIgs and sets bit 6 of zero page location $5D if it
+detects that. It also checks the physical DMA on/off switch on the card as
+well; if it's set, it sets bit 5 of $5D. The following bit of code checks $5D
+to see if bit 6 is clear and skips the instructions at $CC11 to $CC19 if
+so--and since I'm emulating an Enhanced Apple IIe, it *will* skip those
+instructions:
+
+CC0D: 24 5D BIT $5D ; Check if bit 6 of $5D is set (means it's a GS)
+CC0F: 50 0B BVC $CC1C ; Skip over if not set (it's not a IIgs)
+CC11: AD 36 C0 LDA $C036 ; IIgs Speed Reg.
+CC14: 8D 96 C8 STA $C896 ; Save it for later...
+CC17: 09 80 ORA #$80 ; Set speed to 2.8 MHz
+CC19: 8D 36 C0 STA $C036 ; & modify
+
+Luckily there exists a very good techinical reference manual for the Apple
+IIgs; unluckily, it's a bit hard to track down. But once you do, the
+information in it is quite good. The above bit of code shows that the card
+firmware shifts the IIgs into high gear while running on the card. However, we
+don't really care about that bit of code; which is why we spent so much time
+explaining what it does.
+
+CC1C: 68 PLA ; Get flags from slot init
+
+Way back in CWSISM, at slot location $Cx13, there was an innocuous looking PHP
+instuction; here is where we finally take a look at the contents of it.
+
+CC1D: A8 TAY ; Save them in Y
+CC1E: 29 04 AND #$04 ; Check if I flag is set
+CC20: F0 05 BEQ $CC27 ; Skip if I is not set
+CC22: A9 80 LDA #$80 ; Else, signal I flag is set ($80 -> $C893)
+CC24: 8D 93 C8 STA $C893
+
+Here we look at the interrupt disable bit in the processor flags that we saved
+earlier; if it's not set we skip on over to the next bit of code below.
+Otherwise, the code sets $80 into memory location $C983 to signal that
+initialization code was called with the I flag set.
+
+CC27: 98 TYA ; Restore flags from Y
+CC28: 09 04 ORA #$04 ; Set I flag
+CC2A: 48 PHA ; Push them to the stack
+CC2B: 28 PLP ; & restore flags for real
+
+Since we need to get the values of the overflow and carry flags back, which
+were set way back in CWSISM at addresses $Cx0D through $Cx11, we have to
+retrieve them from the Y register, then push them onto the stack and then use a
+PLP to get them back into the flags register proper. Along the way, we set the
+interrupt disable flag at $CC28 (the ORA #$04 instruction).
+
+And in looking at code as we're doing here, it's hard not to look at it with a
+critical eye and notice that the coder could have saved a byte by deleting the
+ORA #$04 (which takes two bytes) and putting an SEI after the PLP (which takes
+one byte). And, since we don't have any source code to look at, we may never
+know what the intention was; though it's quite likely that this was just a
+simple oversight.
+
+CC2C: 50 09 BVC $CC37 ; If SmartPort call, skip over
+
+Here we see that if the card firmware was called via the SmartPort vector at
+$Cx10, the overflow flag would be clear and we would skip over the following.
+But, since the flag was definitely set, we know that we will execute what
+follows:
+
+CC2E: BA TSX ; Slot init & regular ProDOS dispatch get here
+CC2F: 8E 07 C8 STX $C807 ; Save stack pointer in $C807
+CC32: A9 0F LDA #$0F
+CC34: 4C 5F CF JMP $CF5F ; Jump to bank 15:0 for rest of init
+
+This saves the stack pointer and sets up to jump to a new bank, which means we
+won't be coming back here. Onward:
+
+CF5F: A6 5E LDX $5E ; Restore slot # (+$20) in X
+CF61: A0 0B LDY #$0B ; Y gets loaded with bank to return to on RTS
+CF63: 6C 00 C8 JMP ($C800) ; & go!
+
+There are variants of this piece of code throughout every 1K bank of firmware
+code. And since we took a good long look at CWSISM, we know that CWSISM set up
+location $C800 and $C801 to point to the card slot I/O location of $Cx61, and
+suddenly it becomes clear what that bit of code does.
+
+Since the firmware code bounces around a lot in different banks (as we will
+discover shortly), it needs a mechanism to get back to the place that called it
+in the first place. The problem is this: once a new 1K bank of code is
+switched into the $CC00 to $CFFF address space, there's no way for the 65C02 to
+get back to the caller with a simple RTS; any code that attempted to do so
+would end up executing the wrong code as the 65C02 knows nothing about bank
+switching and has no built-in mechanism to handle such things.
+
+And so, by virtue of this, the code needs a way to do this manually. Which is
+why the $Cx61 code in CWSISM saves the bank number on stack, and then sets up a
+pair of RTS calls which first, sets the correct bank and calls the correct
+function number in that bank and second, sets the bank to the bank that made
+the call in the first place before executing a final RTS which then goes back
+to the correct address.
+
+And since we saw up above that it passed $0F into the calling routine (well,
+actually, it jumped there), we know that it's going to call function #0 in bank
+15. As it turns out, the function table for bank 15 looks like this:
+
+CFF0: 00 CC
+CFF2: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+
+which means bank 15 only contains one function, and it starts at $CC00.
+
+
+The Next Part, In Which We Peruse Bank 15
+-----------------------------------------
+
+The story so far: we started in slot ROM, set up a bunch of variables, then
+bounced to bank 11, and just now bounced to bank 15.
+
+CC00: A9 40 LDA #$40
+CC02: 8D 09 C8 STA $C809 ; Put $40 into $C809
+CC05: 8D 32 BF STA $BF32 ; & $BF32(!)
+CC08: 9C 0A C8 STZ $C80A ; Zero out $C80A
+
+So far this is all normal housekeeping boilerplate, though putting the value
+$40 into RAM at address $BF32 makes me raise an eyebrow (to this day, I still
+have no idea what that's supposed to do). So then we come to the heart of the
+matter:
+
+CC0B: A9 03 LDA #$03
+CC0D: 20 AF CF JSR $CFAF ; Call bank 3:0 (enumerate all connected drives)
+
+Here is the first proper JSR into bank switched code, and in taking a cursory
+glance at the code there, well... It's a bit of a Gordian knot. So we'll
+ignore the stones in the field for now, and keep on plowing ahead:
+
+CC10: AE 08 C8 LDX $C808 ; Restore slot # (+$20) to X
+CC13: A5 4F LDA $4F
+CC15: F0 03 BEQ $CC1A ; Skip over if call was successful ($4F == 0)
+CC17: 4C F0 CC JMP $CCF0 ; Else, do a LDA #2B, JMP $CFAF to bank 11:1
+
+So here the code retrieves the slot I/O offset in X from the location set way
+back in CWSISM, then checks what looks like some kind of error condition. If
+it fails, it skips on over to function 1 in bank 11; otherwise, it keeps going
+here:
+
+CC1A: 24 5D BIT $5D ; Are we running on a IIgs?
+CC1C: 70 05 BVS $CC23 ; If so, skip over & keep going
+
+Since we're not running on a IIgs, this branch is not taken and thus it can be
+safely ignored. Continuing on:
+
+CC1E: A9 4B LDA #$4B ; Else, jump to bank 11:2 (normal success path)
+CC20: 4C AF CF JMP $CFAF
+;
+CFAF: A6 5E LDX $5E ; Restore slot (+$20) in X
+CFB1: A0 0F LDY #$0F ; Make sure we come back here...
+CFB3: 6C 00 C8 JMP ($C800) ; & go!!
+
+So what this means is that if the function call to bank 3:0 succeeded, the code
+will then bounce to function 2 in bank 11. And, as we saw above, function 2
+starts at $CD9A in bank 11.
+
+
+The Next Part, In Which Be Bounce Back To Bank 11 And Find Something Familiar
+-----------------------------------------------------------------------------
+
+So far, this little expedition is proving to be circuituitous, but not
+impenetrable. And it makes sense that we would come back to bank 11, as that's
+where the initialization code sent us in the first place. And so, pressing on,
+we find:
+
+CD9A: 86 5E STX $5E ; Save X in $5E
+CD9C: A9 01 LDA #$01 ; Put 1 in $43, $44
+CD9E: 85 43 STA $43
+CDA0: 85 44 STA $44
+CDA2: 64 46 STZ $46 ; Zero out $46, $47, $48, $49
+CDA4: 64 47 STZ $47
+CDA6: 64 48 STZ $48
+CDA8: 64 49 STZ $49
+CDAA: A9 08 LDA #$08 ; Put $08 in $41
+CDAC: 85 41 STA $41
+CDAE: 64 40 STZ $40 ; Zero out $40, $42
+CDB0: 64 42 STZ $42
+
+This is again more housekeeping boilerplate, initializing a bunch of zero page
+locations. Then we find this:
+
+CDB2: A9 09 LDA #$09
+CDB4: 20 5F CF JSR $CF5F ; Call bank 9:0 (directly)
+
+So this calls function 0 in bank 9, which lives at $CC00. And looking through
+that code, well, let's just put that aside for now as it's long and involved
+and will require a fair amount of study. Continuing:
+
+CDB7: A5 4F LDA $4F
+CDB9: D0 0C BNE $CDC7 ; Fail if $4F is non-zero
+
+This looks at the error flag we saw up above in bank 15, and jumps to function
+1 in this bank if the error flag is non-zero.
+
+CDBB: AD 01 08 LDA $0801 ; Get byte @ $801 (!)
+CDBE: F0 07 BEQ $CDC7 ; Fail if it's zero
+
+Now here is something interesting! Why this is interesting is because when
+booting from a floppy disk, the disk driver typically loads at least one sector
+(256 bytes of data) into location $800. So we can deduce that the above call
+into function 0 in bank 9 is loading something similar from the hard drive into
+memory at a similar address. With this bit of knowledge, we can see up above
+where it puts address $800 into zero page locations $40 and $41 that those
+locations must be a loading address.
+
+CDC0: AD 00 08 LDA $0800 ; Get byte @ $800 (!)
+CDC3: C9 01 CMP #$01
+CDC5: F0 03 BEQ $CDCA ; Keep going if it's equal to 1
+CDC7: 4C 91 CE JMP $CE91 ; Else, jump to function 1 (failure point)
+
+Again, this interesting because with floppy disks, the first byte of the first
+sector loaded into memory at $800 contains the number of sectors that the
+floppy driver should load into memory; this looks eerily similar--only in this
+case, it will jump to the failure path if it sees it wanting more than one
+block. Assuming all is well, we then have this:
+
+CDCA: 8D 09 C8 STA $C809 ; Put a 1 into $C809
+CDCD: AD F8 07 LDA $07F8 ; Get $7F8
+CDD0: 0A ASL A ; x16
+CDD1: 0A ASL A
+CDD2: 0A ASL A
+CDD3: 0A ASL A
+CDD4: AA TAX ; Store it in X
+CDD5: A9 00 LDA #$00 ; Stuff 0 in $C035 (GS location?)
+CDD7: 8D 35 C0 STA $C035
+CDDA: 8D 01 CC STA $CC01 ; What does this do?
+CDDD: 4C 01 08 JMP $0801 ; Run the code from block 0
+
+And here we see it hand off execution to data that it pulled from the hard
+drive by jumping to $801, and thus we see that this must be the end of the hard
+drive boot logic. As far as the firmware is concerned, its initialization job
+of bootstrapping the hard drive is concluded.
+
+However, we still really don't know anything that tells us what the slot I/O
+addresses do (aside from location $E) and we still have no idea how the card
+talks to the hard drive. At least we have a pretty good idea of where to look.
+
+
+What Are All These Eels, And What Are They Doing In My Hovercraft
+-----------------------------------------------------------------
+
+So at last we get to take a look at function 0 in bank 3. And, much like a
+hovercraft full of eels, it's a twisty mass of slippery, squirming code. And,
+looking at it more closely, it does a bunch of things which don't make much
+sense until you understand other code, which bounces around to lots of other
+banks. And a lot of it is opaque unless you somewhat understand what the ports
+on the NCR 53C80 do and how the SCSI protocol works.
+
+So while we have an excellent start on understanding, for the most part, the
+broad outlines of how the card works, we are still stuck with a profound lack
+of critical knowledge on how the thing talks to the the hard drive and,
+conversely, how the hard drive talks to the card. And without that knowledge,
+we perish.
+
+
+The Next Part, In Which We Are Not Ready To Perish
+--------------------------------------------------
+
+Fortunately, the NCR 5380 and, by extension, the 53C80 is well documented and
+said documentation is readily available, and so I availed myself of it. I took
+another look at the schematic for the card and noticed that the 53C80 had three
+address lines on it, which implied that it had eight ports for controlling it.
+Unfortunately, there's an error on the schematic in which they have the address
+lines hooked up in reverse, and this caused me no small amount of consternation.
+
+It seemed obvious that those eight ports were hooked up to the slot I/O
+addresses, and also seemed very plausible, after having looked at and analyzed
+a lot of code heretofore unmentioned, that it was connected to the lower half
+of that address space. So, in order to confirm my suspicions, I started
+writing the hard drive emulator.
+
+This started out, simply, as a bunch of statements that output human readable
+words to a log file whenever the slot I/O addresses were accessed by the card
+firmware; I used the firmware's access to the slot I/O to tell me what it said
+and what it was listening for. Well, that, and some code to properly handle
+the bank selection of the ROM space as well. In this way, I was able to
+enlarge my understanding of what the card expected to see as well as what the
+ports that weren't connected to the 53C80 (which were likely connected to the
+Sandwich II) might be up to.
+
+So in fits and starts, I used the code that writes to the Mode Register of the
+53C80 to get the code to successfully... do something. It was at that point I
+could see that it was getting through the initialization phase of the card's
+firmware as Apple2 would be able to boot a floppy image inserted into a drive
+in slot 6 at that point. But in tracing the reads and writes to the slot I/O
+address space in the log I could see that it was getting through the card's
+firmware in a failure mode. It was progress, of a sort. Even failure tells
+you something.
+
+And what it told me was that I needed to dig into the SCSI specification to
+figure out how the protocol worked. Looking back I can see that I was getting
+through to the MESSAGE phase and, because of the way I was responding to that
+message, that the firmware would then send an ABORT message, but that's all
+pretty much meaningless as I haven't explained anything about the SCSI protocol
+and how it works.
+
+And here, while there is a lot of information about the latter day iterations
+of the SCSI protocol, there wasn't much pertaining to the kind of SCSI that the
+Apple High Speed SCSI card spoke, which in its case, has been retroactively
+labeled SCSI-1.
+
+And when looking at the SCSI protocol, the first thing that hits you is that
+it's a very well designed, robust protocol and it's nothing short of a minor
+miracle that it survived and still survives to this day. However, the
+documentation on how it *really* works is a bit lacking. Yes, you can discover
+that there are nine phases, and the first three are fairly easy to understand;
+it's what comes after that where things get murky.
+
+
+Talk SCSI To Me
+---------------
+
+So here is a crash course in the SCSI-1 protocol. The SCSI bus is engineered
+such that it allows for eight devices to connect to said bus; devices connected
+to the bus can have Initiator and/or Target roles. Devices can talk to each
+other by passing messages over this bus, however only one pair of devices can
+use the bus at any one time. In order to prevent deadlock from happening when
+more than one device attempts to take control of the bus, there is an enforced
+hierarchy of devices wherein they all have a unique ID; a device that contends
+for use of the bus at the same time as another device wins this contention if
+and only if its device ID is higher than the other device's ID (1 in this case
+being the highest, and 128 being the lowest). The bus is an 8-bit parallel
+data bus that is controlled by a variety of signals (and these are typically
+called "lines").
+
+In contending for and utilizing the bus, there are nine phases that all SCSI
+devices must understand and negotiate. They are as follows:
+
+ - Bus Free
+ - Arbitration
+ - Selection
+ - Message In
+ - Message Out
+ - Data In
+ - Data Out
+ - Command
+ - Status
+
+In the Bus Free phase, as one might expect, no devices are using the bus. This
+is the ground state of the SCSI protocol, the phase from whence all
+communication starts and where it all ends. Any device that wishes to talk to
+another device on the bus must start here.
+
+Once a device sees that the bus is free, it can enter the Arbitration phase as
+an Initiator; it does so by first setting the bit that corresponds to its
+device ID on the data bus. If another device tries to do this at the same
+time, the device with the lower ID will remove its bit from the data bus and
+try again when it detects that the bus is free again. When the Initiator has
+waited a certain amount of time with no other contention, it then asserts the
+SEL line and goes into the Selection phase.
+
+In the Selection phase, the Initiator sets the bit that corresponds to the
+device ID it wants to talk to (the Target) on the data bus. Every other device
+on the bus, by virtue of the asserted SEL line, knows it's in the Selection
+phase and can see the device ID bits being asserted on the data bus; if none of
+the bits match its own ID, it will stay silent. If the Target device doesn't
+respond in a timely manner, the device that tried "calling" it drops the bits
+it asserted on the data bus and drops the SEL line. Otherwise, if the Target
+device sees its ID on the data bus, it responds by asserting the BSY (BuSY)
+line.
+
+The device that started all of this (the Initiator) then drops the SEL line and
+the Initiator and Target devices then enter the next phase. What phase that is
+took some teasing out of lots of different papers, datasheets and manuals--as
+well as much trial and error in the emulation code. And what I found was this:
+once the devices are in the Selection phase, they typically(*) dance through
+the following set of phases, in order, before being done with their
+transaction: Message Out(**), Command, Data In/Out, Status, Message In.
+
+(*) One exception to this is the TEST UNIT READY command, which will skip the
+Data In/Out phase
+
+(**) Note that the qualifiers "In" and "Out" come strictly from the perspective
+of the Initiator
+
+Once the devices have successfully negotiated the Message In phase at the end
+of their phase dance, the Target device drops the BSY line and the bus is then
+free again for another transaction.
+
+One thing I forgot to mention is that each phase transition, once the devices
+are in the Selection phase, is punctuated by a REQ/ACK handshake. Typically,
+the Target asserts and drops the REQ line while the Initiator asserts and drops
+the ACK line. Basically, when the Target is ready to move to a different
+phase, it will assert the REQ line; the Initiator will see this and then assert
+the ACK line. Once the Target sees the ACK line asserted, it will drop the REQ
+line; the Initiator, seeing this, will then drop the ACK line. And thus hands
+are shaken, and all are in agreement as to where they are and what they are
+doing.
+
+One interesting consequence of this kind of handshaking is that it means that
+every phase past Arbitration is driven by the Target device.
+
+
+By Your Command
+---------------
+
+And so having deciphered the proper steps in the post-Selection phase dance, we
+come as last to the heart of the matter: the Command phase. Commands come in a
+few different flavors: the six byte, the ten byte and the twelve byte. The
+flavor is given by the top three bits of first byte while the command itself is
+given by the bottom five bits. Treating those top three bits as a number from
+zero to seven, the flavors fall into the following groups:
+
+six byte: 0
+ten byte: 1, 2
+twelve byte: 5
+
+Yes, 3, 4, 6 and 7 are all missing, and, for the purposes of this crash course,
+can be safely ignored(*).
+
+(*) For the terminally curious, 3 and 4 are (were?) "reserved", and 6 and 7 are
+for "vendor specific" commands
+
+Having now discerned their form, the question arises: just what do these
+commands do? Basically, they tell the Target what the Initiator wants from it.
+For example, let's say that the Initiator wants to know if a device on the bus
+is ready to receive commands. It would send out, during the Command phase, a
+TEST UNIT READY command which has the following form:
+
+00 00 00 00 00 00
+
+Assuming the device receiving this command actually is ready to receive
+commands, it would then send back a status message (in the Message In phase
+following the Status phase) saying "Good" (which, in this case, is coded as
+$00).
+
+Other commands follow basically the same form; only instead of going directly
+to the Status phase, as the TEST UNIT READY command does, it will go into
+either the Data In or Data Out phase before going to the Status
+phase--depending on what the command does. For example, a READ command will go
+to the Data In phase, because the Initiator is requesting data from the Target;
+likewise, a WRITE command will go to the Data Out phase because the Initiator
+wants to send data to the Target.
+
+
+Back To Our Regularly Scheduled Analysis
+----------------------------------------
+
+So, before we diverged into a crash course of the SCSI-1 protocol, we were
+looking at where I had been able to have the card's firmware return back to the
+Apple IIe's Autostart program, but in a failure mode. Which, while ultimately
+unsatisfying, *was* a step in the right direction.
+
+So I could see that with my hard-coded responses to the firmware's inquiries, I
+was getting an IDENTIFY message ($80) followed by an ABORT message ($06). It
+was a this point I could also see that I was going to have to start writing the
+actual hard drive device emulator code as well, as trying to keep track of all
+the phase changes in the slot I/O register code was turning into an
+impenetrable mess and wasn't going to be fruitful in the long run.
+
+This also necessitated a closer look at the code for function 0 in bank 3. I
+took copious notes on where the code went and what it did, and eventually found
+that almost everything, at some point, seemed to end up calling function 0 in
+bank 16.
+
+
+All Roads Lead To Bank 16:0
+---------------------------
+
+The one thing I was trying to figure out from this code was: what was the
+failure mode that would get you out cleanly? Because in order for the code
+that called here to work properly, it would have to have some kind of clean
+failure mode to indicate that there was no drive present at this device ID;
+also in my first attempts to get the firmware code to successfully run (for
+some value of "successfully" > 0), it would hang up somewhere in this code.
+And that meant, since I didn't understand the SCSI chip, that I would have to
+understand the SCSI chip and how it worked to have any hope of untangling the
+tangled mass of code here.
+
+So before we take a quick look at that, let's take a look at the top level code
+that lives at function 0, bank 16. At first glance, it doesn't look all that
+bad:
+
+CC00: 8D 00 CD STA $CD00 ; Write to $CD00 (what does it do?)
+CC03: 20 D0 CD JSR $CDD0 ; Clear DMA bit (1) from reg. $2, init some stuff
+CC06: 20 CE CE JSR $CECE ; Check if reg. $4 has 0, 2 (/SEL) or 4 (/I/O)
+CC09: B0 16 BCS $CC21 ; If failure, skip over
+
+This is pretty straightforward stuff; the routine at $CECE will set the carry
+flag if slot I/O register $4 is not exactly one of: 0, 2, or 4. If the carry
+is set, it bypasses the following sections of code:
+
+CC0B: 20 42 CF JSR $CF42 ; Check if bit 7 in $C893 is set (success == yes)
+CC0E: 20 24 CC JSR $CC24 ; Do Arbitration phase
+CC11: B0 03 BCS $CC16 ; If Arbitration timed out, jump over Selection
+
+It wasn't obvious when I first encountered this code, but, once I delved into
+the SCSI protocol I was able to figure out that the code at $CC24 was
+negotiating the Arbitration phase.
+
+CC13: 20 7A CC JSR $CC7A ; Do Selection phase
+
+Likewise, it was not obvious that the code at $CC7A was negotiating the
+Selection phase--but I was able to figure out that the code could cleanly exit
+this bank (in a failure mode, naturally) if the BSY line was not asserted.
+
+CC16: 20 58 CF JSR $CF58 ; Check if bit 7 in $C893 is set (success = yes)
+CC19: B0 06 BCS $CC21 ; Skip over if it failed
+
+Since the address at $C893 got loaded with $80 way back in function 0 in bank
+11, the carry flag will be clear and we will execute the following:
+
+CC1B: 20 E4 CC JSR $CCE4 ; Do SCSI communication with target
+CC1E: 20 A0 CD JSR $CDA0 ; Do nothing if $C88F is nonzero, else check on
+ ; $C8EC
+
+The code at $CCE4 was quite mystifying for some time, even after I had educated
+myself on the intricacies of the SCSI protocol and the ins and outs of the NCR
+53C80's ports. I wasn't able to make sense of this until I was able to
+understand the phases after Selection and how they were expected to be
+negotiated.
+
+CC21: 4C 18 CE JMP $CE18 ; Do some post cleanup before returning
+
+The code at $CE18 basically does some error checking and cleanup before
+returning back to whence it came; it's fairly easy to digest. But before we
+dig into subroutines of bank 16:0, we need to take a short digression into how
+the ports of the 53C80 work.
+
+
+A Somewhat Brief Digression Into The 53C80's Ports
+--------------------------------------------------
+
+And so, having avoided looking into the 53C80 and how it works up until this
+point, we find we can no longer avoid it and thus, finally bite the bullet.
+The 53C80 has eight ports (also called registers) with which the Apple IIe's
+CPU can communicate. They are:
+
+$0 - Data on the SCSI bus
+$1 - Initiator Command
+$2 - Mode
+$3 - Target Command
+$4 - Current SCSI Bus Status (R), Select Enable (W)
+$5 - Bus and Status (R), Start DMA Send (W)
+$6 - Input Data (R), Start DMA Target Receive (W)
+$7 - Reset Parity/Interrupt (R), Start DMA Initiator Receive (W)
+
+Note too that there is a one-to-one correspondence with the port numbers as
+they appear on the 53C80 and their location in the slot I/O address range.
+What follows is an explanation of what the registers do:
+
+Register $0 is pretty much what it says it is; data on the SCSI bus will appear
+here barring this caveat: it only works when bit 0 of register $1 (ASSERT DATA
+BUS) is set. Which bring us to...
+
+Register $1 is used to monitor and assert signals on the SCSI bus. The bits
+are:
+
+7 6 5 4 3 2 1 0
+RST AIP/TEST MODE LA/DIFF ENBL ACK BSY SEL ATN DATA BUS
+
+RST (ReSeT) sets the RST signal on the SCSI bus and resets the internal state
+of the 53C80; it stays in the reset state until this bit is cleared. AIP/TEST
+MODE (Arbitration In Progress) is a bit that is split between two functions:
+when read, it signals whether or not the Arbitration phase is in progress; when
+a one is written to it, it disables all output from the chip (zero restores
+output). LA/DIFF ENABL (Lost Arbitration) is another split signal: when read,
+it signals whether or not Arbitration was lost; writing has no effect. ACK
+(ACKnowledge) sets or clears the ACK line, BSY (BuSY), SEL (SELect), ATN
+(ATteNtion) and DATA BUS all do the same.
+
+The important thing to note here is that by setting the ATN line on the SCSI
+bus, the initiator signals to the Target that it wants to send a message and
+so, at the appropriate time, the Target will then assert the MSG and C/D lines
+in response.
+
+Register $2 controls various modes of the 53C80, as well as whether or not
+certain interrupts will be triggered. The bits are:
+
+7 6 5 4 3 2 1 0
+BLOCK TARGET ENABLE ENABLE ENABLE EOP MONITOR DMA ARBITRATE
+MODE MODE PARITY PARITY INTERRUPT BUSY MODE
+DMA CHECKING INTERRUPT
+
+The only two of real interest are bits 1 (DMA MODE) and 0 (ARBITRATE); the
+former sets the chip into DMA mode, readying it for a DMA transfer while the
+latter tells the chip to start the Arbitration phase.
+
+Register $3 is used mainly if the chip is operating in Target mode, as all the
+lines controlled by it are typically only controllable by the Target device.
+The only exception is when the Initiator is sending data to the Target; in that
+case, bits 0, 1 and 2 must match the lines being asserted by the Target. The
+bits are (where X means unused):
+
+7 6 5 4 3 2 1 0
+LAST BYTE SENT X X X REQ MSG C/D I/O
+
+Register $4 is another split register. When read, it returns the state of the
+following lines on the SCSI bus:
+
+7 6 5 4 3 2 1 0
+RST BSY REQ MSG C/D I/O SEL DBP
+
+When written to, it enables an interrupt to occur if the device ID written to
+the SCSI bus is present, BSY is clear and SEL is set.
+
+The important thing about this register is that it allows monitoring of the
+MSG, C/D and I/O lines of the SCSI bus. These three bits are what the Target
+uses to signal moves from phase to phase; without these three bits it would be
+impossible, as an initiator, to figure out what to do once in the Selection
+phase.
+
+And with three bits, you would expect there to be eight phases controlled here,
+but only six are controlled from these signals--having MSG set to 1 while C/D
+is set to 0 is an illegal combination, and that knocks two of the combinations
+right out of contention. Each legal combination corresponds to a phase, and
+this is, as it turns out, vital information:
+
+Data Out: MSG = 0, C/D = 0, I/O = 0 (0)
+Data In: MSG = 0, C/D = 0, I/O = 1 (1)
+Command: MSG = 0, C/D = 1, I/O = 0 (2)
+Status: MSG = 0, C/D = 1, I/O = 1 (3)
+Message Out: MSG = 1, C/D = 1, I/O = 0 (6)
+Message In: MSG = 1, C/D = 1, I/O = 1 (7)
+
+Note that there's nothing magical about the order of these three lines; they
+could be in any order whatsoever and they would still work the same way. The
+only reason that they are presented this way is one, this is how they are laid
+out in the NCR 53C80 chip (in this register in particular) and two, this is
+order that they are used in the firmware.
+
+Register $5 is--you guessed it--another split register. When read, it returns
+some internal state registers as well as a couple more SCSI bus lines:
+
+7 6 5 4 3 2 1 0
+END OF DMA PARITY IRQ PHASE BUSY ATN ACK
+DMA REQUEST ERROR ACTIVE MATCH ERROR
+
+When written to, it initiates a DMA send transfer from memory to the SCSI bus.
+
+Register $6, another split register, when read, holds data coming from the SCSI
+bus during a DMA transfer. When written to, it initiates a DMA receive
+transfer from the SCSI bus (the Target) to memory.
+
+And finally, register $7 is yet another split register, that when read, resets
+the internal PARITY ERROR, IRQ ACTIVE and BUSY ERROR bits in register $5; when
+written to in initiates a DMA receive transfer from the SCSI bus (the
+Initiator) to memory.
+
+
+Back To Bank 16
+---------------
+
+So, with that info-dump out of the way, let's return back to the first
+subroutine of the initial code of bank 16:0. We start with the routine at
+$CC24:
+
+CC24: 9E 63 C0 STZ $C063,X ; Zero reg $3 (Target Command)
+CC27: 20 2F CF JSR $CF2F ; Toggle bit 7 of reg. $E (ON-off-ON)
+CC2A: AD DA C8 LDA $C8DA ; Get SCSI ID of initiator device
+CC2D: 9D 60 C0 STA $C060,X ; & put it in reg. $0 (Output Data)
+;
+CC30: 9E 62 C0 STZ $C062,X ; Zero out reg. $2 (Mode)
+CC33: A9 01 LDA #$01
+CC35: 9D 62 C0 STA $C062,X ; Set bit 0 (ARBITRATE) of reg. $2
+
+This code zeroes out the Target Command register, then toggles bit 7 of
+register $E on, then off, then back on. It then puts the SCSI ID of the
+initiator device into the SCSI Data Bus register, then clears and sets the
+ARBITRATE bit of the Mode register. This is the start of the Arbitrate phase.
+
+CC38: BD 6C C0 LDA $C06C,X ; Get reg. $C
+CC3B: 89 10 BIT #$10 ; Check bit 4
+CC3D: D0 05 BNE $CC44 ; Skip over this if it's set
+CC3F: 20 0C CF JSR $CF0C ; Toggle bit 7 of register $E ON-off-ON
+ ; # of times before C is set is in $C817/8
+CC42: B0 2E BCS $CC72 ; Signal failure is C is set
+
+There is a lot of this code and variants thereof sprinkled liberally throughout
+the firmware code. I'm still not sure what bit 4 of register $C is a signal
+for, but it seems clear that it indicates some kind of error condition because
+whenever it's not set, it toggles bit 7 of register $E and will eventually,
+when this has happened enough times, signal an error and exit.
+
+CC44: 3C 61 C0 BIT $C061,X ; Check bit 6 (AIP) of reg. $1
+CC47: 50 E7 BVC $CC30 ; Try again if it's not set
+
+This little bit of code checks the AIP (Arbitration In Progress) bit, and loops
+back to try again if it's not set.
+
+CC49: EA NOP ; Do a small delay
+CC4A: EA NOP
+CC4B: A9 20 LDA #$20
+CC4D: 3D 61 C0 AND $C061,X ; Check if bit 5 (LA) of reg. $1 is set
+CC50: D0 DE BNE $CC30 ; Try again if it's set
+
+After checking to see if the AIP bit is set, it then waits a short amount of
+time before checking to see if the LA (Lost Arbitration) bit is set; if it's
+set, it loops back to try again.
+
+CC52: BD 60 C0 LDA $C060,X ; Get reg. $0
+CC55: 4D DA C8 EOR $C8DA ; EOR it with what we put there to begin with
+CC58: F0 05 BEQ $CC5F ; If it's the same, bypass (we won arbitration)
+CC5A: CD DA C8 CMP $C8DA ; Otherwise, see if the EORed value is >= orig
+CC5D: B0 D1 BCS $CC30 ; Try again if so
+
+Here we look at the data on the SCSI bus and see if there were any other
+devices attempting to arbitrate at the same time. If there were, and their
+SCSI ID was higher than ours, then loop back and try again; otherwise, we won
+arbitration and continue on:
+
+CC5F: A9 20 LDA #$20
+CC61: 3D 61 C0 AND $C061,X ; Check if bit 5 (LA) of reg. $1 is set
+CC64: D0 CA BNE $CC30 ; Try again if so
+
+We check the LA bit one more time to ensure it's not set; if it is, then loop
+back and try again.
+
+CC66: A9 06 LDA #$06 ; Set bits 1-2 (ASSERT /ATN, /SEL) of reg. $1
+CC68: 1D 61 C0 ORA $C061,X
+CC6B: 29 9F AND #$9F ; And clear bits 5-6 (TEST MODE, DIFF ENBL) of $1
+CC6D: 9D 61 C0 STA $C061,X
+CC70: 18 CLC ; Signal success
+CC71: 60 RTS ; & return
+
+Now that we've won the Arbitration phase, we assert the ATN and SEL lines and
+make sure that the TEST MODE and DIFF ENBL lines are dropped. By setting the
+ATN line, we signal to the Target that we want to go to the Message Out phase
+after the Selection phase is done. Once that's done, we signal success and
+return.
+
+CC72: A9 80 LDA #$80
+CC74: 8D 8F C8 STA $C88F
+CC77: 4C 91 CD JMP $CD91 ; Signal failure
+
+This bit is called if the code that checks register $C fails; this is the only
+failure path for the Arbitration phase code.
+
+
+A Fine SELECTion Of Devices
+---------------------------
+
+Now that the Initiator (us) has won the Arbitration phase, it's time to see if
+the device we want to talk to exists, and is ready and able to talk.
+
+CC7A: 9E 64 C0 STZ $C064,X ; Zero out reg. $4 (Select Enable)
+CC7D: AD DA C8 LDA $C8DA ; Host ID
+CC80: 0D DB C8 ORA $C8DB ; Target ID
+CC83: 9D 60 C0 STA $C060,X ; Store $C8DA & DB (ORed) into reg. $0 (Data Bus)
+CC86: A9 41 LDA #$41 ; Set bits 0 (DATA BUS) & 6 (TEST MODE) in reg. $1
+CC88: 1D 61 C0 ORA $C061,X ; Then clear bits 5-6 (DIFF ENBL, TEST MODE) in $1
+CC8B: 29 9F AND #$9F
+CC8D: 9D 61 C0 STA $C061,X
+
+The code here clears the Select Enable register to ensure no IRQs are generated
+during the Select phase, then puts both the Initiator's SCSI ID and the
+Target's SCSI ID into the 53C80's data register. It then does something that
+doesn't seem to make any sense, as it sets the DATA BUS ENABLE and TEST MODE
+bits. The former puts the 53C80's data register onto the SCSI data bus, while
+the latter disables all outputs of the 53C80. Maybe this was necessary because
+of the Sandwich II chip and the way it was hooked up to the slot I/O bus and
+the 53C80, but there's no way to know for sure without access to actual
+hardware.
+
+After this, it disables the TEST MODE bit, which then enables the outputs of
+the 53C80, and thus the Target's SCSI ID is then visible to all the devices
+connected to the SCSI bus.
+
+CC90: A9 FE LDA #$FE ; Clear bit 0 (ARBITRATE) in reg. $2
+CC92: 3D 62 C0 AND $C062,X
+CC95: 9D 62 C0 STA $C062,X
+CC98: A9 02 LDA #$02 ; Set bit 1 (DMA MODE) in reg. $2
+CC9A: 1D 61 C0 ORA $C061,X
+CC9D: 9D 61 C0 STA $C061,X
+CCA0: AD DC C8 LDA $C8DC ; Get $C8DC, set hi bit, save in $C821
+CCA3: 09 80 ORA #$80
+CCA5: 8D 21 C8 STA $C821
+CCA8: A9 F7 LDA #$F7 ; Clear bit 3 (ASSERT /BSY) in reg. $1
+CCAA: 3D 61 C0 AND $C061,X
+CCAD: 9D 61 C0 STA $C061,X
+
+This is all pretty straightforward stuff. It clears the ARBITRATE bit, sets
+the DMA MODE bit, and clears BSY (if it was set before; more likely than not,
+it will have been cleared already). It also sets bit 7 of $C8DC and saves it
+in $C821, but it's not clear just why yet.
+
+CCB0: 20 51 CD JSR $CD51 ; Wait for bit 6 (/BSY) of reg. $4 to be set
+CCB3: 90 03 BCC $CCB8 ; Skip over JSR if success
+CCB5: 20 75 CD JSR $CD75 ; Shorter wait for bit 6 in reg. $4 to be set
+
+This bit of code waits for the Target to assert the BSY line; if it fails after
+the first attempt, it will try again with a shorter wait time.
+
+CCB8: A9 FB LDA #$FB ; Clear bit 2 (ASSERT /SEL) in reg. $1
+CCBA: 3D 61 C0 AND $C061,X
+CCBD: 9D 61 C0 STA $C061,X
+CCC0: 90 10 BCC $CCD2 ; Skip over if the JSR was successful
+
+This code drops the SEL line, and depending on whether or not the Target
+asserted the BSY line, will either drop through to the failure path or skip
+over to the success path.
+
+CCC2: A9 FE LDA #$FE ; Clear bit 0 (DATA BUS) in reg. $1
+CCC4: 3D 61 C0 AND $C061,X
+CCC7: 9D 61 C0 STA $C061,X
+CCCA: A9 81 LDA #$81 ; Put $81 in $C88F
+CCCC: 8D 8F C8 STA $C88F
+CCCF: 4C 91 CD JMP $CD91 ; Signal failure
+
+This is the only failure path in the Selection phase code, but, unlike the
+Arbitration phase code, this code path will *not* lock up waiting for signals.
+It will wait only so long for the Target to assert the BSY line before giving
+up and signalling failure. It will also bail out of this bank completely, so
+it will not try any further communication--for now.
+
+CCD2: A9 9D LDA #$9D ; Clear bits 1, 5-6 (TEST, DIFF E., DMA) in $1
+CCD4: 3D 61 C0 AND $C061,X
+CCD7: 9D 61 C0 STA $C061,X
+CCDA: A9 FE LDA #$FE ; Then clear bit 0 (DATA BUS) in $1
+CCDC: 3D 61 C0 AND $C061,X
+CCDF: 9D 61 C0 STA $C061,X
+CCE2: 18 CLC ; Signal success
+CCE3: 60 RTS ; & return
+
+Otherwise, the code clears TEST MODE, DIFF ENBL and DMA MODE before clearing
+DATA BUS, signalling success and returning.
+
+
+The Next Part, In Which We Find Ourselves In A Maze Of Twisty Code
+------------------------------------------------------------------
+
+Now that we've successfully navigated the Selection phase, it's time to talk
+SCSI. For the sake of brevity, we will refer to this code as The Code That
+Comes After Selection, or TCTCAS for short. This bit of code calls a bunch of
+other code which in turns calls even more code; keeping it all straight was
+quite the challenge.
+
+CCE4: BD 6C C0 LDA $C06C,X ; Get $C
+CCE7: 89 10 BIT #$10 ; Is bit 4 set?
+CCE9: D0 05 BNE $CCF0 ; Skip ahead if so
+CCEB: 20 0C CF JSR $CF0C ; Else, toggle bit 7 of $E (ON-off-ON) w/countdown
+CCEE: B0 40 BCS $CD30 ; Exit if countdown hit zero
+
+Here again we see the boilerplate checking of bit 4 of register $C.
+
+CCF0: BD 64 C0 LDA $C064,X ; Get reg. $4
+CCF3: 29 42 AND #$42 ; Are bits 1 (/SEL) & 6 (/BSY) clear?
+CCF5: F0 3A BEQ $CD31 ; If so, we're done (jump down, signal error)
+
+Here we're checking the BSY and SEL lines; if both have been dropped after the
+last phase, we jump down to $CD31 and do some final checking before exiting.
+
+CCF7: C9 40 CMP #$40 ; Is only bit 6 (/BSY) set?
+CCF9: D0 E9 BNE $CCE4 ; Loop back if not...
+
+The second check looks to see if only BSY is set; if not it loops back to the
+start of this subroutine, otherwise it continues on:
+
+CCFB: BD 62 C0 LDA $C062,X ; Clear bit 1 (DMA MODE) of reg. $2
+CCFE: A8 TAY
+CCFF: 29 FD AND #$FD
+CD01: 9D 62 C0 STA $C062,X
+CD04: 98 TYA ; Then restore its previous state
+CD05: 1D 62 C0 ORA $C062,X
+CD08: 9D 62 C0 STA $C062,X
+
+This little bit of code toggles DMA MODE line off then on if it was set to
+begin with, otherwise it does nothing. Well, it doesn't *do* nothing, but the
+effect is null and void.
+
+CD0B: BD 64 C0 LDA $C064,X ; Is bit 5 (/REQ) of reg. $4 clear?
+CD0E: A8 TAY
+CD0F: 29 20 AND #$20
+CD11: F0 D1 BEQ $CCE4 ; Loop back if so...
+
+This checks to see if the REQ line has been asserted by the target yet, and if
+not, loop back to the beginning of the subroutine.
+
+CD13: AD 1F C8 LDA $C81F ; Save $C81F in $C820 (last 3-bit pattern we saw)
+CD16: 8D 20 C8 STA $C820
+
+Here we save the last phase that was seen in $C820.
+
+CD19: 98 TYA ; Restore reg. $4 from Y
+CD1A: 29 1C AND #$1C ; Keep only bits 2-4 (/I/O, /C/D, /MSG)
+CD1C: 8D 1F C8 STA $C81F ; & save in $C81F
+
+Earlier we saved the contents of register $4 (which holds the MSG, C/D and I/O
+bits) in the Y register, now we retrieve them and mask off the MSG, C/D and I/O
+bits and save them for later. By virtue of this, every time we get here the
+previous value that was in $C81F must be different than the last value we saw
+here.
+
+As to why: when I first encountered this code, I approached it the way I
+usually approach unknown code: by feeding it zeroes. However, when I did that,
+these lines of code caused a failure mode later on. And so I had to dig a
+little deeper into all things SCSI and 53C80 to figure out why--we'll see why
+that caused a failure later on.
+
+CD1F: 4A LSR A
+CD20: 8D 2B C8 STA $C82B ; & put /2 in $C82B
+
+Here we shift it right one bit and stuff it into $C82B; this is also a clever
+way of making it into an index for a jump table.
+
+CD23: A8 TAY ; & use as index into jump table
+CD24: 4A LSR A ; & /2 again
+CD25: 9D 63 C0 STA $C063,X ; Write it to reg. $3 (Target Command)
+
+Here we put it into the Y register and then shift it to the right one more time
+to set the bits in the Target Command register properly. The Initiator needs
+to set this register properly at each phase change, otherwise the 53C80 will
+signal a phase match error.
+
+CD28: 20 48 CD JSR $CD48 ; Use Y as idx to jump table and go there
+
+So here the code uses the three phase bits (MSG, C/D and I/O) as an index into
+a jump table to handle the six phases after the Selection phase (Data Out, Data
+In, Command, Status, Message Out, Message In). We'll have more to say about
+this shortly.
+
+CD2B: 2C 06 C8 BIT $C806 ; Is bit 7 of $C806 clear?
+CD2E: 10 B4 BPL $CCE4 ; Loop back if so...
+CD30: 60 RTS
+
+This simply checks bit 7 of $C806, which only gets set under very specific
+circumstances; those being that MSG, C/D and I/O are all asserted (Message In
+phase), and that the value returned from the Target is a "Good" message, and
+that the prior phase was either Message In, Message Out, or Status.
+
+CD31: AD 8F C8 LDA $C88F ; Get $C88F
+CD34: D0 08 BNE $CD3E ; If $C88F is != 0, just return
+CD36: A9 82 LDA #$82 ; Stuff $82 into $C88F
+CD38: 8D 8F C8 STA $C88F
+CD3B: 4C 91 CD JMP $CD91 ; Signal failure (?) & return
+CD3E: 80 F0 BRA $CD30
+
+This is the code path taken if the BSY and SEL lines are dropped. It signals
+that something went wrong before returning.
+
+
+The Next Part, In Which Things Start To Make Sense
+--------------------------------------------------
+
+So TCTCAS is, as it turns out, where the Target drives the Initiator; which in
+this case is the hard drive driving the card. As I mentioned up above, when I
+first started poking around at this code, I was feeding it zeroes at first as a
+place to start seeing if I could get it to do something meaningful. However,
+when you try that, you run into the following bit of code which says, "No,
+fuggetaboutit."
+
+CEE5: AD 1F C8 LDA $C81F ; Get the current MSG, C/D, I/O values
+CEE8: CD 20 C8 CMP $C820 ; Compare it to the previous values
+CEEB: D0 05 BNE $CEF2 ; If they're different, skip over
+CEED: A9 27 LDA #$27 ; (This is ignored by the jump target)
+CEEF: 4C 6C CE JMP $CE6C ; Else, do a soft, then a hard reset of the card
+CEF2: ...
+
+And so, after looking over the SCSI documentation for the umpteenth time, I
+realized that what it was saying is that you can't do a Data Out phase directly
+after the Selection phase; it has to be Something Else. And this is because
+$C81F gets initialized with zero (which corresponds to the Data Out
+phase)--which means starting with zero Won't Work.
+
+As luck would have it, however, we know that in the Selection phase, it
+asserted the ATN line, which in turn tells the Target to assert the MSG and C/D
+lines (but not I/O). Which means that we *know* that the Target will first go
+to the Message Out phase, every time.
+
+And so, by writing the hard drive emulator to properly respond to the MSG, C/D
+and I/O lines I got it to handshake the Message Out phase properly. But I
+could see that after that, it wasn't exiting; it was running through another
+round of seeing what was in MSG, C/D and I/O and running the appropriate
+handler.
+
+Now I was a bit stuck here, as there was *no* documentation on how a Target
+device, such as a hard drive, would drive the handshaking for the Initiator
+device. And it wasn't clear what phase the firmware was expecting to come
+next, so guessing wasn't likely to yield positive results.
+
+So, by the serendipitous luck of the Search Engine gods, I stumbled upon a page
+which looked like a scan of a book mixed with some bespoke images made by
+someone whose primary language was not English. One of the images, which had
+misaligned text set next to it, was, however, suggestive. It showed a sequence
+of phases that went from Bus Free to Arbitration to Selection to Message Out to
+Command to Data In to Status to Message In to Bus Free. This was the first
+time I had seen anything like this; in all of the SCSI literature that I had
+surveyed, there was nothing beyond the vaguest hints that there was a typical
+order to the phases. Sure, they would say that one *could* go from one phase
+to another, and how the handshaking worked, but there was *nothing* saying that
+there was a definite order to the phases that should be observed.
+
+So, as I said, this image was highly suggestive. Could this be the key to the
+whole thing that I was missing?
+
+I had set things up in the hard drive emulation to go to the Message Out phase
+after the Selection phase, and so I added code to go to the Command phase after
+that. I could see that the firmware was sending something in the Command phase
+at this point, which was the following six bytes: 00 00 00 00 00 00. And
+looking that up in the SCSI literature showed that to be the TEST UNIT READY
+command. But the firmware was still looking for more.
+
+From what I saw in the logs, it didn't look like it was going for a Data In
+phase next, so I set it up to go to the Status phase, and that got things going
+a little bit further. To me, this looked like it should be the end of the
+dance, but the firmware was *still* looking for more.
+
+But even though a byte was sent from the Target to the Initiator during the
+Status phase, it seemed that the Status reponse was actually sent in the
+Message In phase. Once I had coded this into the hard drive emulation, I could
+see the TEST UNIT READY command going into TCTCAS and coming out of it in a
+non-failure mode.
+
+The dance has steps, and they must be followed in order.
+
+
+Dancing In The Dark
+-------------------
+
+However, something is still not quite right; my assumption--that all the
+firmware needed to do to see if there was a drive on the bus was to probe
+through to the Selection phase and then, if anything responded, to see if it
+successfully responded to the TEST UNIT READY command--turned out to be wrong.
+How wrong? Let's take a look back at the code in bank 3:0 which attempts to
+enumerate all devices it can see on the SCSI bus:
+
+CC55: A0 07 LDY #$07
+CC57: 8C 73 C8 STY $C873 ; Save Y in $C873
+CC5A: 9C DC C8 STZ $C8DC ; Zero out $C8DC
+CC5D: B9 F4 CF LDA $CFF4,Y ; Get SCSI ID from table into A
+CC60: CD DA C8 CMP $C8DA ; Compare it to our SCSI ID (default is $01)
+CC63: F0 1F BEQ $CC84 ; Skip over if it's equal (don't query our SCSI ID)
+
+So here it's looping through all eight SCSI IDs, starting with the lowest
+priority and working its way up to the highest (for reference, the table at
+$CFF4 has the following values: $01, $02, $04, $08, $10, $20, $40, $80). It
+compares the SCSI ID from the table to the SCSI ID of the card, and skips over
+the following code (down to $CC84) if it's the same.
+
+CC65: 8D DB C8 STA $C8DB ; Else, put SCSI ID to look at in $C8DB
+CC68: 64 4F STZ $4F ; Zero out $4F (error flag)
+CC6A: 20 5F CF JSR $CF5F ; Do TEST UNIT READY (calls bank 16:0)
+
+This is the code that I was now able to successfully navigate with my hard
+drive emulation. It emulated exactly one SCSI ID, and that one ID returned
+here successfully (every other ID, obviously with nothing connected to the bus,
+returned failure). However, I could see from the log file that it was trying
+to issue some more commands--which was puzzling, but told me that I needed to
+dig even deeper into the code.
+
+CC6D: A5 4F LDA $4F ; Get error code
+CC6F: D0 0F BNE $CC80 ; Skip over if error occurred
+
+This is fairly straightforward; it checks the error code returned from the call
+we made to bank 16:0, and if it's anything but zero, skip over the following
+code:
+
+CC71: EE 0D C8 INC $C80D ; Success means add one to $C80D (# of devices)
+CC74: 20 9F CC JSR $CC9F ; & call Function 1 in this bank (INQUIRY + MORE)
+CC77: 90 0B BCC $CC84 ; Check next ID if C == 0
+
+So here we increment a counter, which we suppose to be a count of the number of
+valid devices we have found on the SCSI bus. And here, we come to the
+realization that it isn't just hard drives that can talk to the Apple High
+Speed SCSI card, it's also printers, scanners, tape drives and whatnot. And
+so, it makes perfect sense that TEST UNIT READY is only the first step in
+discovering if a device is a hard drive or not because here, it calls function
+1 of bank 3 (the bank we're currently in) which is what issues more commands to
+figure out what the device it's talking to actually *is*.
+
+CC79: A9 99 LDA #$99 ; Else, stuff $99 into $C887
+CC7B: 8D 87 C8 STA $C887
+CC7E: 80 17 BRA $CC97 ; & signal success
+
+So if the call to $CC9F (INQUIRY + MORE) returned with the carry flag set, it
+stuffs a magic number into $C887, signals success and returns.
+
+CC80: C9 80 CMP #$80 ; Was error $80?
+CC82: F0 16 BEQ $CC9A ; Signal NoDrive error if so
+
+This is where it lands if the TEST UNIT READY call returned a non-zero result
+in the "error code" memory location. if it equals $80, it puts the ProDOS error
+code for a "NoDrive" error into the error code and returns.
+
+CC84: AC 73 C8 LDY $C873 ; Restore Y
+CC87: 88 DEY ; Done looking at all IDs?
+CC88: 10 CD BPL $CC57 ; Go back if not.
+
+Here we decrement the counter and loop back if we haven't looked at all eight
+(except for the card's) SCSI IDs. Otherwise, we've finished, and fall through
+to the following:
+
+CC8A: A9 77 LDA #$77 ; Else, stuff $77 into $C80A & $C887
+CC8C: 8D 0A C8 STA $C80A
+CC8F: 8D 87 C8 STA $C887
+CC92: AD 0D C8 LDA $C80D ; Did we find any devices?
+CC95: F0 03 BEQ $CC9A ; Signal NoDrive if not
+CC97: 64 4F STZ $4F ; Else, signal success
+CC99: 60 RTS ; & return
+
+So here it stuffs the magic number $77 into $C887 and $C80A; it also checks the
+"number of devices found" memory location, and signals a "NoDrive" error if the
+count is equal to zero.
+
+CC9A: A9 28 LDA #$28 ; Return $28 (NoDrive) in $4F
+CC9C: 85 4F STA $4F
+CC9E: 60 RTS
+
+This is the landing location for the various failure modes seen up above; it
+simply puts the ProDOS "NoDrive" error into the error flag and returns.
+
+So now I get to figure out what the commands are in that call to 3:1 that are
+causing the card to return in a failure mode.
+
+
+The Test Is Easy, When You Have The Answer Key
+----------------------------------------------
+
+At this point, even though I had the hard drive emulation doing a proper dance
+through the TEST UNIT COMMAND, it was in a very crude state and couldn't really
+do anything else. And so I had to take a closer look at the seemingly
+impenetrable code that set up a bunch of memory locations before calling bank
+16:0 to see if I could make sense of it.
+
+Rather than go through every last one, I will go through part of the first such
+piece of code, as it's instructive:
+
+CD0E: 20 A4 CF JSR $CFA4 ; Set $60/1 to $C923, $56/7 to $C92F
+CD11: 20 B9 CF JSR $CFB9 ; Put $C9C3 into $C92F/30, zero $C931
+CD14: A9 12 LDA #$12 ; Put $12 into $C923
+CD16: 8D 23 C9 STA $C923
+CD19: 9C 24 C9 STZ $C924 ; Zero out $C924-6, $C928
+CD1C: 9C 25 C9 STZ $C925
+CD1F: 9C 26 C9 STZ $C926
+CD22: 9C 28 C9 STZ $C928
+CD25: A9 1E LDA #$1E ; Put $1E in $C927, $C933 (length of reply, 30)
+CD27: 8D 27 C9 STA $C927
+CD2A: 8D 33 C9 STA $C933
+
+So we can see right off the bat that it's setting up zero page locations $60
+and $61 to point to memory at $C923, and that it sets up six bytes at that
+location with the following:
+
+C923: 12 00 00 00 1E 00
+
+Reaching back to our crash course on SCSI commands, we can see by the first
+byte, since the top three bits are all zero, that this must be a six-byte
+command. And after that, uh, well, we don't really know much of anything. So
+after digging around some more for something even remotely relevant, I found a
+document dealing with SCSI-2 and SCSI-3 hard disk interfacing--which told me,
+first of all, that $12 was the INQUIRY command, and second, that the fifth byte
+in the command was the length of the message that the Initiator was expecting
+back from the target in response to this command. Progress!
+
+CD2D: 20 CB CF JSR $CFCB ; Call bank 16:0 (Do INQUIRY command)
+CD30: A5 4F LDA $4F
+CD32: F0 05 BEQ $CD39 ; Skip over if no error
+
+And this, as we now know, does the phase to phase dance from start to finish,
+and checks the resulting error code to do any necessary error handling. But
+what of the response? How do we know what to say from our emulated hard disk
+back to the firmware? The hard disk interface document had something that
+looked plausible, if overlong (it seems that latter day SCSI drives are
+expected to return 148 bytes instead of 30). So I expected that I could adapt
+that to suit the purposes of the emulation.
+
+It was obvious that I had to write code to handle more than just the TEST UNIT
+READY command, and that it had to be able to send and receive data over the
+SCSI bus, which it, in its current state, couldn't do. Eventually I was able
+to get that working and I could see that the firmware was successfully
+negotiating the INQUIRY command *and* coming to the conclusion that it was
+talking to a hard disk. More progress!
+
+And, as it turns out, this first call in bank 3:1 is what determines what the
+device we're talking to actually is, and it sets up appropriate memory
+locations to signal that to other parts of the firmware. This is another one
+of those places where the "Technical Manual for the Apple SCSI Card" had a
+useful tidbit, namely a small table that looked something like this:
+
+Code Device Type
+------------------------------
+$03 Nonspecific SCSI
+$05 CD-ROM
+$06 Direct-access tape drive
+$07 Hard disk
+$08 Scanner
+$09 Printer
+
+These device codes are different from the device codes that the INQUIRY command
+returns, and this bit of code also does the translation from one to the other.
+
+
+The Next Part, In Which More Progress Is Made
+---------------------------------------------
+
+And so, in using similar analysis in the other parts of the code called by bank
+3:1, I was able to discern that after the INQUIRY command, it was calling the
+MODE SENSE, MODE SELECT, READ CAPACITY and READ commands afterward. And since
+I didn't know exactly what these commands returned, I used the time honored
+method of returning messages consisting of all zeroes.
+
+And, in fixing up the hard drive emulation to respond to these commands, I
+could see the firmware was making it all the way through the bank 3:1 code
+successfully, and not in a failure mode. It didn't boot anything yet, as I
+hadn't written the code to load a hard disk image much less dole it out over
+the SCSI bus, but it was a good result and I could finally see the end of this
+Herculean task coming into view.
+
+However, I could see from the log file that something still wasn't quite right.
+
+
+The Next Part, In Which Things Start Getting LUN-ey
+---------------------------------------------------
+
+The problem was one of too much success. It wasn't going through the set of
+INQUIRY, MODE SENSE, MODE SELECT, READ CAPACITY and READ commands just once, it
+was doing it *eight* times. And in looking for the culprit, I found the
+following tidbit:
+
+CCE5: EE DC C8 INC $C8DC ; Increment a counter
+CCE8: AD DC C8 LDA $C8DC
+CCEB: C9 08 CMP #$08
+CCED: D0 B0 BNE $CC9F ; Loop back if we haven't checked 8 times yet
+
+It wasn't obvious on first examination, but I eventually figured out that
+location $C8DC was being put into byte one of every command being sent over the
+SCSI bus--as I could see the INQUIRY command was changing every time it was
+called like so:
+
+12 00 00 00 1E 00
+12 20 00 00 1E 00
+12 40 00 00 1E 00
+12 60 00 00 1E 00
+12 80 00 00 1E 00
+12 A0 00 00 1E 00
+12 C0 00 00 1E 00
+12 E0 00 00 1E 00
+
+And so, after more digging into the hard disk interface document, I could see
+that the field being modified was called the Logical Unit Number, or LUN for
+short. Further, hard disks conforming to the SCSI-2 and SCSI-3 had a
+commandment, that being as follows:
+
+The LUN Shall Be Zero, And Zero Shall The LUN Be. It Shall Be No Other Number
+Save For Zero, For Any Other Number Shall Be An Abomination Before The Drive.
+
+Well, going by simple logic, it would appear that the SCSI-1 protocol was not
+bound by such a rule, and so you could have eight Logical Units for each SCSI
+device on the bus. But this presents an interesting challenge. We need to
+tell the firmware to pound sand for all but one LUN.
+
+
+Failure Is An Option
+--------------------
+
+And so I found myself in the position of needing to have the hard drive
+emulation fail in a meaningful way; which sounds like an oxymoron but really
+isn't. I needed to code the hard drive emulation to respond with a CHECK SENSE
+message, which is how, I eventually discovered, that you signal an error
+condition in the SCSI protocol. When I did this, the firmware then sent a
+REQUEST SENSE command, which I wasn't sure how to craft a response that would
+signal failure for an invalid LUN. Responding with all zeroes didn't signal
+failure as I hoped it would, so it was back to the hard disk interface document
+to find the missing information.
+
+There I found out that byte two of the response is a four-bit "Sense Key", and
+that zero corresponds to "No Sense", which means the command was successful.
+Which, as it turns out, is no way to signal failure. The one that fit the bill
+was five, which corresponds to "Illegal Request".
+
+And so it seems that 16 Sense Keys was not enough for the designers of the SCSI
+protocol, so those Sense Keys correspond to broad categories. To give even
+more fine-grained responses to what went wrong, there are at least two more
+eight-bit bytes called the "Additional Sense Code" and "Additional Sense Code
+Qualifier", which, taken together, provide for 65,536 different combinations.
+And, in the interface document, I found $08 $00 which corresponds to "Logical
+Unit Communication Failure" which seemed like a reasonable message for this
+failure mode.
+
+Coding up the meaningful failure path and running the emulation showed that
+this mostly satisfied the firmware; it would almost get all the way to the
+point where it attempted to read block zero from the hard drive in a
+non-failure mode, but there was still a small problem.
+
+
+Every Problem Is Small, From A Certain Point Of View
+----------------------------------------------------
+
+There is a call in the bank 3:1 code that calls bank 4:0 to read a block from
+the disk and do some analysis on what it finds. The logs also showed that this
+code was also doing a lot of writing to slot I/O register $F. Much of it being
+calls to the following brief routine:
+
+CFB4: AD 86 C8 LDA $C886 ; Get the value in $C886
+CFB7: 4A LSR A ; Shift the hi nybble to the lo nybble
+CFB8: 4A LSR A
+CFB9: 4A LSR A
+CFBA: 4A LSR A
+CFBB: 09 08 ORA #$08 ; Set the high bit of the lo nybble
+CFBD: 9D 6F C0 STA $C06F,X ; & store it in slot I/O register $F
+CFC0: 60 RTS
+
+This was some highly suggestive code, and what it suggested was that it was
+using three bits of a value set up elsewhere which made for eight combinations.
+The only significant loose end, as far as the hardware was concerned, was the
+8K static RAM; in all of the analysis I had done up to this point, it *seemed*
+that only 1K of it was ever used. But this code suggested otherwise.
+
+It was suggesting that slot I/O register $F was a bank select soft switch for
+the 8K static RAM; once I coded it up as such, the firmware was then completely
+satisfied and would get all the way to where it attempted to read block zero
+from the hard drive in a non-failure mode.
+
+
+The End Is Nigh
+---------------
+
+And so, having studiously and painstakingly laid the foundation for the actual
+purpose of the hard drive emulation--that being the transfer of data to and
+from the thing--I came at last to the part where I had to actually write code
+to have real data flowing to and from the emulated hard disk. And this, as it
+turns out, was the least interesting part of the whole thing; getting the
+contents of files into memory and parsing them is a really trivial thing and
+usually quite boring.
+
+So in writing this bit of code, I used 4am's "Pitch Dark" hard drive image, and
+added the necessary code to serve up appropriate slices of it in response to
+the firmware's READ command. And, of course, after running the new emulation
+it failed to load anything.
+
+It was then that I remembered that I sent back messages of all zeroes to
+requests from commands, for the most part, with a few exceptions. One of these
+that was sure to cause problems without a proper response was the READ CAPACITY
+command. When the firmware inquired about the size of the hard drive, the
+emulator would happily tell it that it had zero capacity--which meant that any
+attempted reads by the firmware would be out of range.
+
+So I coded up a proper response for the size of the hard drive image I was
+using and fired up the emulator and... It still didn't work. The logs told me
+that it was sending a ten-byte command, and one I hadn't seen before, which was
+basically the ten-byte variant of the READ command. Once I had *that* coded up
+properly, I fired up the emulator and after a few seconds, found myself in the
+monitor.
+
+What? Why? How does this even--
+
+To quell the questions that were pooling up in my head I wrote some hooks into
+the emulator to trigger a code trace at the appropriate time; that being where
+the code transfered control to memory address $801, the ostensible location
+where the firmware allegedly read from block zero and placed it in memory at
+$800. And I knew that it was getting to that point successfully because the
+firmware doesn't get there unless everything is working on the SCSI bus as it
+should, and the trace in the log file confirmed this.
+
+There are worse things than being dumped into the Apple II monitor; at least I
+could poke around memory and disassemble things to try to figure out what was
+going wrong. And I could see that the block that was loaded into memory was
+looking at the slot ROM for a certain value that caused it to take a branch
+that landed it in a crash zone. This made no sense whatsoever.
+
+Fortunately for me though, I have the ability to disassemble a snapshot of any
+memory range that I desire--so I disassembled the entire block from $800 to
+$9FF. And what I saw there was still strange; near the end of the block it
+just kind of ran out of instructions, like something was missing. And looking
+near the middle of the block, I saw something eerily similar to what I saw at
+the end.
+
+Then I realized it wasn't similar, it was *identical*. Looking through the
+hard drive emulator code, I was not surprised to find this:
+
+static uint8_t * buf;
+static uint8_t bufPtr;
+
+Yes, I had made a rookie mistake of using too small of a value for my buffer
+pointer; it was loading the correct block, but, because the buffer pointer was
+only eight bits wide, it only copied the first 256 bytes out of the hard drive
+image *twice*.
+
+As embarrassing as this was, it was also good news, as it meant that firmware
+bootstrap code was working; it was reading real data from the hard drive
+emulation and running correctly. Which meant that once I fixed the size of my
+buffer pointer, the emulated hard drive should boot up correctly.
+
+And once I coded up the fix and started up the emulator once more, after six or
+so seconds, "Pitch Dark" came up on the screen and it was glorious...
+
+
+Sic Transit Gloria Mundi
+------------------------
+
+I was able to navigate forward and back through the various games on the hard
+drive image; I could even view the artwork that came with each one. And lo and
+behold: the games worked!
+
+I was playing through a bit Wishbringer when I got to a point where I wanted to
+save my game. And, even though there was no WRITE command hooked up yet, I
+tried it anyway and got a nice hard lockup on the emulator. This would never
+do--to have a hard disk that was read-only--so I coded up the WRITE command
+handler.
+
+And upon booting up the hard drive, it looked like it was OK, only there were
+problems; namely, while you could navigate through the various games, you could
+not launch them. As a matter of fact, the only game that *could* be launched
+was Zork I, which was the first game to pop up on the menu. So after looking
+the code, I noticed that there was an asymmetry in the ports used for reading
+and writing to the SCSI bus. Which requires a brief digression into data
+transference.
+
+
+To DMA, Or Not To DMA, That Is The Question
+-------------------------------------------
+
+As it turns out, I was finally able to figure out that the physical DMA on/off
+switch on the card was wired to bit 6 of slot I/O register $C. I further found
+out that, since I was defaulting to zero for any unknown bit in the slot I/O
+registers, that it was treating the DMA switch as if it were in the off
+position. However, even so, the firmware was still treating this as a DMA
+transfer.
+
+And, looking at the 53C80 manual, I could see that it supported three distinct
+kinds of bus I/O: Programmed I/O (or PIO for short), Direct Memory Access (or
+DMA for short) and Pseudo DMA. Of these three, PIO is the slowest, as it
+relies 100% on handshaking on the SCSI bus for data transfer, while DMA is the
+fastest, as all you need to do is set some registers and tell the 53C80 to go
+and it handles the transfer all in the background without the need for any
+intervention from the CPU whatsoever. But what the firmware was doing, in this
+DMA switch in the off position mode, was Pseudo DMA.
+
+How it works for reading data from the SCSI bus is that the CPU monitors bit 6
+(DMA REQ) in the slot I/O register $5, then reads the data that shows up in
+slot I/O register $6 when the DMA REQ bit is asserted. For this kind of
+transfer to work, however, there must be some kind of address decoding that
+will assert the DACK (Dma ACKnowledge) line once the data is read. Because
+this code works, we can logically deduce that the read to slot I/O register $6
+is wired to produce this signal, even if we can't prove it conclusively through
+the schematic of the card.
+
+Writing works in a similar manner by monitoring the DMA REQ line, but instead
+of writing to slot I/O register $6 (which is a trigger for starting a DMA
+transfer) it writes to slot I/O register $0. And, as we inferred through logic
+about the setting of the DACK line in the reading case, we can similarly infer
+that the DACK line is being set in a similar manner in the writing case.
+
+The upshot is, even though Pseudo DMA transfers are still CPU intensive, they
+are faster than PIO transfers. And when it comes to relatively slow CPUs like
+the 65C02, faster is better.
+
+
+And They All Lived Happily Ever After-ish
+-----------------------------------------
+
+So in looking at the code for the WRITE command, I could see that I had it
+using register $6 for the data transfer, which, as we can see from the short
+digression above, won't work. Fixing this to look at the correct register ($0)
+brought things into alignment, and a thorough test of "Pitch Dark" confirmed
+that I had indeed solved the problem.
+
+So, in the final analysis, I was finally able to restore decency to Apple2 and
+play "Pitch Dark" on it to boot. But was it worth it? In my opinion the
+answer is an unequivocal "yes", and not just because it enables the use of hard
+drive images in emulators.
+
+The reason this little exercise in digital archaeology was worth the effort
+expended is that it underscores a problem that seems to have gone largely
+underappreciated: the early microcomputers, in some respects, are very well
+documented; however, in many others, they are not--and the knowledge of exactly
+how they worked is in danger of disappearing. The fact that the documentation
+for the Apple High Speed SCSI card is of a consumer oriented nature with very
+little technical content was of little use in figuring out how it really
+worked, and shows a marked contrast to the early days of Apple where they
+published very detailed information about their computers and how they worked,
+including schematics and source code.
+
+All that is to say that unless those of us who still remember these artifacts
+and have the ability to analyze them to tease out their inner workings actually
+*do* so, these things *will* disappear, and they will pass out of human memory
+forever.
+
+
+--------------
+v1.0: 6/3/2019
+v1.1: 1/10/2020