The goal of this project was to develop a system capable of emulating an Apple II personal computer.
This project attempted to reconstruct a functional Apple II emulated on Atmel ATmega32 processors. Due to time constraints, a fully functional Apple II was not produced; however, a working emulated 6502 processor, memory subsystem, and partial GPU was produced.
High Level Design
The Apple II was the first popular microcomputer manufactured by Apple. It used a 1MHz MOS Technology 6502 processor, had 4KB RAM (addressable 64KB), an audio cassette interface for data storage (later replaced by a floppy disk drive), and color graphics capability by NTSC composite video output. Originally, it ran IntegerBASIC, a BASIC interpreter that served as its operating system. This project attempted to emulate an Apple II system without sound, using Atmel ATmega32 processors and some external hardware. Constructing a fully functional Apple II would result in a usable personal computer, capable of running vintage personal computing software as well as any new software desired when compiled to the 6502.
On a personal level, taking on this project meant that there were very clear constraints to what could be done with the direction of the project, as the only non-Apple II components must be compatible with the original Apple II in some form to qualify. The net result was a relative paucity of major hardware and software tradeoffs, which simplified the design stage. In addition, the availability of original software specific to the platform was a large draw.
In order to provide a more challenging project and to provide a modern update for future compatibility, the CPU was emulated in AVR assembly. The project fulfills the Apple IIās capacity for SRAM through the use of two external 32KB SRAM chips. A GPU was implemeted in AVR assembly to use DMA (Direct Memory Access) to access the common RAM for 40Ć40 16 color NTSC display, as occurred in the original Apple II. Finally, PS/2 keyboard input was processed by the CPU and translated to equivalent Apple II keyboard input access on the fly. The computer as a whole can be viewed as being under the central control of the CPU, with each subsystem as subsidiary components.
Although the prototype Apple II was not fully polished due to timing constraints, each component had at least basic functionality. User interaction with the Apple II occurs through loading 6502 machine-language programs over RS-232 serial at 9600 baud, and watching the output on the TV. Once the CPU has booted, it waits for a 4-byte program size and machine language program to be loaded, and then commences operation. While the program size is in hex, sizes with components larger than 9 are not accepted; thus, programs are generally padded with 6502 ānopsā to avoid this constraint. In practice, these values are calculated beforehand and sent over serial as prepared files. If serial debugging is enabled, the progress of the emulated program may be viewed in Codevision Terminal with automatic appending of linefeeds to carriage returns, in ASCII mode.
The main hardware tradeoff encountered was found in the GPU. Earlier projects implementing color NTSC had used a chip specifically for the generation of sync pulses and the NTSC color burst (the ELM304). It was determined that this chip would simplify the GPU software, but was strictly unnecessary and increased cost and hardware complexity. Therefore, it was not included in the final design.
The major standards involved in the project were the NTSC color television standard, the RS-232 serial standard, and the PS/2 serial standard. As no devices using these standards function correctly without proper implementation, not following these standards was not an option.
The usability and safety of this device is comparable to the Apple II and Macintosh computer that it is based from. The original Apple II had some important usability advancements over previous personal computers. For instance, the color display and computer casing were revolutionary at the time and have important usability implications over previous designs. Although the current prototype does not have a casing, as work continues it is not expected to be difficult to procure a satisfactory enclosure. In general, computers are enabling devices that allow disabled persons to live more productive lives.
Intellectual property considerations are important to take into account for this project. The Apple II was a proprietary system and its source code was copyrighted by Apple. In addition, the 6502 is still in production, but the patent (issued in 1975) is believed to be expired. It is also believed that a prototype emulator of this system constitutes fair use for the copyrights envolved.
Software Design
The software design consisted of three segments: the 6502 emulation on the CPU, the video generation on the GPU, and the keyboard input on the CPU. As each was implemented separately and independently from each other, they will be discussed in turn.
CPU
The basic structure of the 6502 emulation is to load programs into RAM, and then begin execution of individual 6502 instructions. All 6502 registers were emulated in hardware, as shown in Table 1.
6502 Register | Description | AVR Register(s) |
---|---|---|
S | Status Register 7: N (Negative) 6: V (Overflow) 5: Unused 4: B (Break flag, 1 == BRK, 0 = IRQB) 3: D (Decimal Mode Enabled) 2: I (IRQB disble) 1: Z (Result Zero) 0: C (Carry) |
R19 |
A | Accumulator (used in many instructions by default) | R8 |
X | Index Register (general purpose register or index value of effective address) | R2 |
Y | Same as X | R3 |
IR | Instruction Register ā store current Opcode | R4 |
SP | Stack Pointer | R5 |
PC (16 bits) | Program Counter | R12(Low):R13(High) |
Although initially the majority of the CPU was intended to be done in C, it was found that the Codevision compiler had very poor optimization capabilities, and in addition limited access to similar features of the AVR to the 6502 that could be helpful. Therefore, the vast majority was written in AVR assembly. To execute an instruction, the value at the PC is loaded from RAM, and then used as an AVR address to index into a hand-coded jump table. The jump table then redirected the instruction to the appropriate execution block.
There are 151 separate operations that can be executed on a 6502 when all addressing modes are accounted for. However, there are a great number of similarities between opcodes, which allows for easy implementation. Addressing modes are shared between opcodes, and execution macros are shared between opcodes of the same type but different addressing modes. Not only does this allow ease of coding, but it also reduces the necessary number of testcases, since each addressing mode and each instruction need only to be tested once.
Although most instructions were trivial to implement, some caused considerable difficulty. The AVR and 6502 ISAās have different conventions for status register flags, and existing documentation was often inconsistent or incomplete. For instance, the 6502 natively implemented support for Binary Coded Decimal (BCD), a format where decimal numbers are stored in nibbles of binary bytes, allowing precise decimal calculations while sacrificing range. However, there are many incompatible forms of BCD, and specific documentation for the 6502 was extremely lacking. Eventually, enough references were found to ensure a precise determination of the behavior of the processor when in Decimal mode. When that was determined, it then was necessary to implement that behavior, which was fairly complicated. In the end, AVR Application Note 204, a reference implementation for BCD on AVR, was used in a modified form. This pattern, with the exception of the helpful Application Note, was repeated for many opcodes.
The final design saw most of the instruction execution code as assembly macros, and most of the addressing code as C macros.
Testing was accomplished on the CPU through an extensive suite of 6502 assembly programs, which would automatically trigger a breakpoint interrupt on an error, or infinite loop if the error was a branching problem. Each instruction was tested separately, but without all addressing modes. Then, all addressing modes were tested for a few instructions. Because of the nature of the modular structure of the emulator, this was sufficient for ensuring a high level of compatibility with the 6502.
Testing the interface to the RAM was accomplished through a simple, separate test program that, using identical code to the main CPU program, wrote values to every byte in memory and read them back, checking for accuracy. This proved invaluable when faulty connectors disrupted good dataflow between the RAM and CPU.
Video Generation
Although monochrome video generation was done in the previous labs, and several previous groups had done color video generation in the past, the unique nature of the GPU neccesitated a unique design. In most previous designs, the video generation processor either only used internal RAM, was the sole master of the external RAM, or had the contents of the RAM fed explicitly to it. The GPU in this project was required to have secondary, synchronized bus access to the RAM, and retrieve data itself when the CPU was not actively using it. In addition, although it was not implemented for this prototype, the high resolution mode on the Apple II GPU required much tighter control of time than an interrupt-based approach would allow.
Therefore, all timing was done through clock-tick counting. This, combined with the lack of an external sync pulse generator, meant that the exact specifications of NTSC had to be known precisely, rather than just the basic data period and wait period used by many other projects. This caused many difficulties in producing proper video output, as most documentation available on the NTSC standard only describes a part of the picture.
Nearly all of the GPU software was written in AVR assembly. Several timing constructs were introduced by concatenating ānopsā to produce precise delays, up to a delay of 31.25us (500 clock ticks) at a time. Ignoring some basic initialization, the program can be seen as essentially consisting of a loop to output frames, with internal loops to output display lines and vertical synchronization and blanking lines. Inside each display line, a horizontal pulse was first emitted for 4.75 us, followed by the āback porchā (containing the color burst along with blanking for synchronization) for 4.75 us. Then came the data region of 45 us, which was shortened to compensate for the selected televisionās large out-of-bounds scan.
Following that can the āfront porchā of 4.5625 us, which either looped back to make another of the 242 data lines or continued on to the first 3 vertical blanking lines. These were like data lines but with a continuous black level. They were followed by three vertical synchronization lines, which were identical to blanking lines except that the vertical sync line was high, which effectively inverted the horizontal sync and suppressed the color burst. This was followed by 14 more lines of vertical blanking to make up for the remaining full 262 lines of interlaced output.
Inside the data lines, work was constantly being done. Unpacked data was stored in a special line buffer for immediate display, while raw data was stored in a larger, packed array. During the horizontal sync pulse, back porch, and front porch, the line buffer was populated anew from packed data stored in the large buffer. During the data portion, that data was read out on PORTD to produce the appropriate output. All of these operations were computationally balanced to ensure consistent timing.
In addition, since the theoretical output resolution of the GPU was much higher than the 40Ć40 resolution used by the basic Apple II graphics mode, each pixel stored in the line buffer had to be extended to a width of 18 clock ticks horizontally and 6 lines vertically. Extending horizontally was done by delaying between changing the outgoing color value, while vertical stretching was accomplished by maintaining a counter and only unpacking the next line when the counter indicated to.
Testing the GPU was accomplished by writing test patterns to the screen and observing the results. Since the results were always clearly right or clearly wrong, no automated testing was necessary. An oscilloscope was also used to monitor the composite video output. Debugging was often difficult due to a lack of hard information about the allowable video waveforms and obscure assembly errors. Stripping out almost all traces of C eliminated many bugs, as it was determined that the Codevision compiler was stepping on important status registers to as temporary registers to perform simple operations.
Parts List:
Part | Quantity | Price Per Unit | Total |
STK500 | 1 | freeādonated by CUSat | 0 |
ATMEGA32 | 2 | freeādonated by Kris Young | 0 |
TC55328 | 1.99 | 3.98 | |
Small PCB | 1, plus some scraps | 1 | 1 |
Large PCB | 1 | 2.5 | 2.5 |
Custom PCB | 1 | 5 | 5 |
Color Television | 1 | freeāborrowed from girlfriend | 0 |
SN74HC157 Multiplexer | 4 | 0.53 | 1.12 |
MM74HC4049N Inverter | 1 | 0.6 | 0.6 |
AD724 NTSC Encoder | 1 | 9.65 | 9.65 |
3.58KHz Oscillator | 1 | 0.4 | 0.4 |
PS2 Keyboard | 1 | freeāalready owned | 0 |
PS2 Keyboard Socket | 1 | freeāfound on scrap motherboard | 0 |
Total Cost | $24.25 |
For more detail: AppleII emulator Using Atmel Mega32