Emulating video game consoles
Transcription
Emulating video game consoles
Emulating video game consoles Luke Zapart May 4, 2013 1 Contents 1 Introduction 3 2 Emulation 2.1 Accuracy . . . . . . . . . . . . 2.2 System emulation . . . . . . . 2.3 CPU emulation . . . . . . . . 2.3.1 Interpretation . . . . . 2.3.2 Recompilation . . . . . 2.3.3 Static recompilation . 2.3.4 Dynamic recompilation 2.3.5 Hotspot method . . . . 2.4 Graphics emulation . . . . . . 2.4.1 Framebu↵ers . . . . . 2.4.2 Tile-based graphics . . 2.4.3 Sprites . . . . . . . . . 2.4.4 Vector displays . . . . 2.4.5 Image scaling . . . . . 2.5 Sound emulation . . . . . . . 2.5.1 Sampling . . . . . . . 2.5.2 Frequency modulation 2.5.3 Aliasing . . . . . . . . 2.6 Documentation . . . . . . . . 3 The Kiwi emulator 3.1 Architecture . . . . . . . 3.2 Challenges . . . . . . . . 3.2.1 Line rendering . . 3.2.2 Cycle accuracy . 3.2.3 Video game bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 6 8 10 10 11 12 15 15 16 17 18 19 21 22 24 24 26 28 28 . . . . . 29 29 31 31 32 33 4 References 35 4.1 Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2 4.2 1 Emulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Introduction A video game console is a computer designed primarily for playing video games. As technology progresses, video game consoles are made progressively better and capable of running ever more realistic games. As this happens, old consoles are made obsolete and become unsupported, thus making it increasingly difficult to experience old games. One solution to this inconvenience is the practice of emulating old consoles using emulators. A video game emulator is a piece of software designed to emulate the workings of a video game console well enough to make it possible to run games designed for that console on a di↵erent system – usually a personal computer or a modern console. Fans also use emulators to run video game modifications unapproved by the manufacturers of the console, such as fan translations and hacks. Another use of emulators is to aid in the development of games through enhanced debugging and the convenience of reducing friction in the builddeploy-test cycle by testing on the same machine where development happens. This thesis is an introduction to emulating video games and an attempt to systematize knowledge about emulation. The information provided herein is based on literature about specific aspects of emulation as well as my own experience writing emulators. I used what I learned during writing the thesis to write my own emulator, Kiwi, which is capable of running a number of games for the Sega Genesis, 3 a console popular in the early 1990s. Figure 1: The game Aladdin running in the Kiwi emulator 2 Emulation Most computers produced after the 1940s implement the Von Neumann architecture (Fig. 2), which consists of a processing unit (CPU), memory and input and output mechanisms. These subdivisions share a common bus, a subsystem that transfers data between components. An emulator has to emulate all of the components of a computer architecture as well as their interoperation. An example of such an architecture is the one implemented in Sega Genesis. The architecture, as shown in Fig. 3, consists of two general-purpose processors, three Random Access Memory (RAM) devices, a Video Display Processor (VDP), a Sound processing unit, a Programmable Sound Generator (PSG), input and output (I/O), periph4 Memory CPU Bus I/O Figure 2: Von Neumann architecture erals and a replaceable cartridge that holds the program data. As some of the components are 8-bit and some are 16-bit, there are 8 and 16-bit buses and a bus arbiter. Each bus consists of a control, address and data buses. Z80 CPU 8K RAM Bus (8-bit) Sound Bus Arbiter M68000 CPU I/O 64K RAM Cartridge VDP Bus (16-bit) Peripherals 64K Video RAM PSG Figure 3: Simplified Sega Genesis architecture Emulation is primarily the domain of hobbyists. Because of this, there is no unifying theory on how to emulate computer systems. There is however, one thing that all emulators share, and that is the main loop. Typically, to emulate the top-level logic of the desired computer architecture, a loop is used, as shown in Fig. 4. Each component is ran for a number of steps, usually equivalent to a fraction of a second, e.g. a frame (there are 50/60 frames in a second on a 50/60Hz display). Then, synchronization is performed between the various components. 5 while not finished: for component in components: component.run_for_some_time() synchronize() Figure 4: Main emulation loop pseudo-code 2.1 Accuracy Although at first glance it may seem like the best way to go about emulating each component is to emulate it as accurately as possible, accuracy comes at a significant cost in performance. Whether performance or fidelity is more important depends on the requirements of a given emulation project. The possible accuracy levels have been classified in [15] into classes of decreasing accuracy: Data-path Accuracy is the highest level of accuracy. Emulators in this class simulate the physical characteristics of a given component, down to the data-path level. This is mostly used in hardware development for prototyping integrated chips. Another use is historical preservation, as seen in the interactive Visual 6502 simulator [26] which simulates a 6502 CPU down to the transistor level by working with digitized microscopic photographs of the CPU (Fig. 5). This is also the slowest method of emulating a chip; the Visual 6502 simulator runs at 27Hz on a modern computer, which at 2GHz is a slowdown by a factor of approximately 75 million. Cycle Accuracy forgoes emulating the precise workings of a chip, instead only requiring faithful emulation of the underlying Instruction Set Architecture (ISA) and the timings of instructions relative to each other. Typically, precise timings are only required on old 8 and 16-bit architectures, where programmers could depend on the timings for certain 6 Figure 5: Vectorized layers of the MOS 6502 CPU, courtesy of Visual6502.org [27] e↵ects. An example is VICE [37] which emulates Commodore 64 cycleaccurately. Instruction-level Accuracy is concerned with accurately emulating instructions and their side e↵ects while ignoring their precise timing. This is a fairly common level of accuracy as it is a reasonable accuracy vs speed trade-o↵. Basic-block Accuracy allows replacing larger blocks of code with native code equivalents. This method is faster than instruction-level accuracy but it does not allow single-stepping through code and does not work with code that relies on its original form, such as self-modifying code. 7 The Amiga emulator UAE [31] is an example. High-Level (HLE) Accuracy identifies structures in code that can be replaced by native high-level functionality. Recent gaming consoles are often emulated by detecting audio and video library code calls and replacing them by native library implementations without ever executing the original library code. The UltraHLE [29] Nintendo 64 emulator is famous for being the first emulator of this kind. At the time of its release, 1999, a mere 3 years after the release of the Nintendo 64, it ran commercial titles at a playable frame rate on the hardware of the time. This approach had its drawbacks; at the time of release, UltraHLE was only able to emulate approximately 20 games to a playable standard. Overall, faster emulation means less accuracy. Fig. 6 compares the speed of Visual 6502, VICE and UltraHLE. Some consideration should be given to undocumented CPU instructions, as games might depend on undocumented behavior [21]. Emulator Visual 6502 VICE UltraHLE Accuracy class Data-path Cycle HLE Speed 27Hz @ 2GHz 2x1Mhz @ 500Mhz 93.75Mhz @ 350Mhz Slowdown factor 75 million 250 3.7 Figure 6: Speed comparison of three emulators from distinct classes 2.2 System emulation Typically a computer system has one or more address buses, in which addresses and address ranges are assigned to components. For example, the Sega Genesis has two address buses, as shown in Fig. 7. Memory mappings fall under two categories: 8 Device Address (hex) 16-bit bus Cartridge 000000-3FFFFF 8K RAM A00000-A01FFF Sound processor A04000-A04003 I/O A10000-A1001F VDP Data C00000 VDP Control C00004 PSG C00011 64K RAM FF0000-FFFFFF 8-bit bus 8K RAM 0000-1FFF Sound processor 4000-4003 Bank register 6000 PSG 7F11 Banked memory 8000-FFFF Figure 7: Simplified Sega Genesis address map Linear mapping – a mapping where an address range is translated to a corresponding physical address range and address lines are used directly without any decoding logic, for example in accessing physical memory. Direct mapping – a 1:1 mapping of a unique address to one hardware register or a physical memory location. As seen in the Sega Genesis example (Fig. 7), memory maps often contain holes. For simplicity of circuit design, certain mappings are mirrored, for example on the Sega Genesis, the RAM address 1234 can be accessed by accessing both FF1234 and EF1234. An emulator must decode all memory accesses according to the appropriate memory map. Components send signals called interrupts, of which there are two types. A hardware interrupt is a signal from a component indicating it needs atten9 tion. A software interrupt is a CPU instruction. When a CPU receives an interrupt, it suspends its current state of execution and begins execution of an interrupt handler. This usually invokes an exception appropriate to the kind of interrupt. Interrupts are typically used for signaling from components, timers and traps (user interrupts). A simplified exception table for the main CPU in the Sega Genesis system is shown in Fig. 8. On the Genesis, the VDP fires a Vertical Blank interrupt every vertical blank, which occurs at the start of every displayed frame (50 times a second on a PAL TV and 60 on an NTSC TV), which is useful for timing-dependent functions. The VDP also signals a Horizontal Blank interrupt every horizontal blank, which occurs after a line has been displayed (about every 10 µs). Finally, there is an external interrupt, used for example to tell the game a light gun has been fired [11]. 2.3 CPU emulation [18] describes three approaches to emulating CPUs: interpretation, recompilation and a hybrid method. 2.3.1 Interpretation Interpretation is the simplest way of emulating a CPU. As shown in Fig. 10, an interpreter directly simulates the fetch-decode-execute cycle (Fig. 9). That is, until the emulator is shut down, it fetches the next instruction, decodes and executes it. Interpretation is slow, as for every original instruction a couple dozen native instructions are executed. Additionally, modern CPUs take advantage of a technique called pipelining, which allows them to execute many instruc10 Type Reset Errors Traps Interrupts Traps Number 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32-47 Exception Reset Initial Stack Pointer Reset Initial Program Counter Access Fault Address Error Illegal Instruction Integer Divide by Zero CHK Instruction TRAPV Instruction Privilege Violation Trace Line 1010 Emulator Line 1111 Emulator Spurious Interrupt Level 1 Interrupt (Unused) Level 2 Interrupt (External) Level 3 Interrupt (Unused) Level 4 Interrupt (Horizontal Blank) Level 5 Interrupt (Unused) Level 6 Interrupt (Vertical Blank) Level 7 Interrupt (Unused) Trap #0-15 Instructions Figure 8: Sega Genesis exception vector table for the 68000 CPU[9] tions at once, as long as it is possible to predict what instructions will be called in advance. A simulated fetch-decode-loop makes predicting future instructions impossible, thus modern CPUs are often unable to use pipelining in interpreter code. 2.3.2 Recompilation Recompilation translates emulated code to native code. It takes advantage of the fact that it is unnecessary to decode instructions every time they need to be executed, and moves the decoding part to the pre-processing phase of 11 Fetch Decode Execute Figure 9: Fetch-decode-execute cycle the emulator. Threaded recompilation is the simplest form of recompilation, as it simply replaces each original instruction with a library call to a native code implementation of the original instruction. An example translation can be seen in Fig. 11. Real recompilation, on the other hand, translates original code into native code directly, without the need for instruction function calls. Recompilation can be done ahead-of-time (AOT), i.e. before execution or just-in-time (JIT), i.e. during execution. 2.3.3 Static recompilation Static recompilation is ahead-of-time recompilation. A statically recompiled program is indistinguishable from a native program and does not contain any translation logic. Like normal ahead-of-time compilers, static recompilers can perform global optimizations on programs, resulting in fast execution time. This comes at a cost, however, as static recompilation is a hard problem. Programs on Von Neumann architectures frequently intertwine code and data and recognizing what is code and what is data might be hard or impossible, depending on the architecture. 12 while not finished: # fetch instruction = memory[IP] IP += 2 # increment the instruction pointer register # decode if instruction == 0x4e71: # execute NOP instruction pass else if (instruction >> 8) == 0: # decode and execute an ORI instruction # decode the appropriate effective address size = (instruction >> 6) & 3 ea_mode = (instruction >> 3) & 7 ea_register = instruction & 7 ea = get_ea(ea_mode, ea_register, size) # OR the effective address by an immediate value *ea |= get_immediate(size) set_flags() else if (opcode >> 8) == 1: ... Figure 10: 68000 CPU interpreter pseudo-code To illustrate this, a jump table, as shown in Fig. 12, is a piece of code that loads an address based on an index and then jumps to it. In the example, if index holds the number 0, the code will jump to Level and if index holds 1, the jump will be to Title. The problem is, a recompiler does not know how many entries there are in the jump table, so it can’t know that 02427fff (the machine code representation of the andi.w instruction) is an instruction and not an address to jump to. Video games up until the 1990s were often written in assembly, and used techniques harmful to static analysis, such as self-modifying code. Some 13 move.w #7, d0 addi.w d1, d0 ori.w #1, d0 move(1, 7, 0) addi(1, 1, 0) ori(1, 1, 0) ori(size, source, dest): d[dest] |= d[source] set_flags() Figure 11: Sample 68000 code and resulting threaded code (pseudo-code) move.w (index).w,d0 lsl.w #2, d0 ; d0 = d0*4 (addresses are 4 bytes long) move.w JumpTable(pc,d0.w),d0 jmp JumpTable(pc,d0.w) JumpTable: dc.l Level dc.l Title dc.l Ending dc.l Credits andi.w ... $7FFF,d2 ; machine code = 02427fff Figure 12: Example 68000 assembly jump table seemingly harmless practices, such as decompressing, decrypting or loading code into memory are basically equivalent to self-modifying code for recompilation purposes. These issues make it difficult to write a completely accurate static recompiler that is oblivious to the input code, but often such characteristics are unnecessary. If only a single game is needed to be recompiled, it suffices to play through the game completely, logging what is code and what is data with an interpreting emulator and then to perform static recompilation [1]. 14 2.3.4 Dynamic recompilation A dynamic, or just-in-time (JIT) recompiler translates original code to native code on the fly and then caches the resulting native code for later execution. Although translation of a single piece of code is more computationally expensive than interpretation, most of the execution time in a program is usually spent in relatively small loops. A JIT recompiler only has to translate a loop once, but it can run the loop thousands of times, so for an average program this is an overall speed improvement. JIT recompilers do not have to possess advance knowledge of what is code and what is data, since only valid code paths will be translated. JIT also solves the problem of self-modifying code. If a code modification is detected, the code can be marked as dirty for later recompilation. For this and for the speed improvements, JIT recompilers are popular in modern emulators. 2.3.5 Hotspot method The hotspot method, first used in the eponymous HotSpot Java virtual machine[3], aims to combine the best of both worlds, the speed of interpreters in rarely executed code and the speed of JITs in often executed code. It does so by interpreting by default and identifying hot spots, that is, oftenrun code and marking it for recompilation. This is best seen by examining the cost formulas in Fig. 13 [16]. The average cost per execution is visualized in Fig. 14, in which it can be seen that for typical values of ce , ci and cr (2, 10 and 50 respectively) the hotspot method approaches dynamic recompiling in terms of cost for often executed code, while maintaining the fast speed of interpreting code that is only run once. 15 Interpreter Dynamic recompiler Hotspot method Cost nci cr + nce if n > t then tci + cr + (n else nci t)ce where ci cr ce n t cost of a single interpretation of the code cost of translating the code cost of running the translated code number of times the code is to be emulated number of times the code is interpreted before it is translated Figure 13: Cost formulas for an interpreter, a dynamic recompiler and the hotspot method 2.4 Graphics emulation Graphics and music hardware is varied and it is impractical to enumerate all of the di↵erent variations – it suffices to introduce the more common and general concepts used. Once the functionality of the given hardware is known, it is a fairly straight-forward process to emulate it. Optimization of emulation depends largely on specific components used. A video display (such as a TV) displays graphics composed of pixels and refreshes at a specific rate (typically 50 or 60 times per second). A pixel is the smallest controllable element of a picture. A game console processes graphics and then sends them to the video display at a specific resolution. For example, the Sega Genesis is capable of outputting 240 lines each consisting of 320 pixels (this is typically referred to as a resolution of 320x240). Video game consoles usually have a distinct video output device which processes graphics and outputs video signal. 16 !(" =83,-4-,3,>?8.:70!-,01:47@,A132413!:,3B1C *+,-./,!0123!4,-!,5,063718 !'" !&" !%" !$" !#" !" !$ !& !( !) !#" !#$ 96:;,-!1<!,5,0637182 !#& !#( !#) !$" Figure 14: Average cost graph for di↵erent emulation methods 2.4.1 Framebu↵ers The simplest video output device is the framebu↵er, which drives a video display from a memory bu↵er containing a complete frame of data. The memory bu↵er typically consists of color values for each pixel on the screen. The number of values each color value can take is the color depth, and this is the number of colors the device can display simultaneously. The bigger the color depth, the more bits it takes to store a single color value, and in turn, the more memory is needed to store the frame bu↵er. For example, a 24-bit 800x600 frame bu↵er needs 800 ⇤ 600 ⇤ 24/8 = 1440000 bytes, or 1.4 megabytes of memory, while a 4-bit one with the same resolution only needs 800 ⇤ 600 ⇤ 4/8 = 240000 or 234 kilobytes of memory. In order to save memory and storage, a technique called indexed color is used where instead of a full color bu↵er, an indexed bu↵er is used. With this technique, color information is stored in a bu↵er called a palette in which every color is indexed by its position in the palette. Instead of full color values, an indexed bu↵er stores palette indexes for each pixel. An example 17 (a) 1-bit (b) 24-bit (truecolor) (c) 2-bit and palette (d) 8-bit and palette Figure 15: 1, 2, 8-bit indexed color pictures and original 24-bit color picture indexed image in 1, 2 and 8-bit color depths is shown in Fig. 15. As shown in Fig. 16, this method was used in numerous systems. For example, the Sega Genesis has a palette that can hold up to 64 di↵erent colors selected from 512 di↵erent color values. 2.4.2 Tile-based graphics To save memory and storage even more, instead of directly controllable framebu↵ers some systems implemented a tile-based method of storing graphics 18 Color depth 1-bit 3-bit 4-bit 6-bit 8-bit 11-bit 12-bit Number of colors 2 8 16 64 256 2048 4096 Used in Atari ST ZX Spectrum Amstrad CPC Sega Genesis Atari TT, AGA, Falcon030 Sega Saturn Neo Geo Figure 16: Indexed color depth used in a number of systems data in memory. Using this method, screens are composed of tiles instead of individual pixels. In turn, the tiles are composed of pixels or even smaller tiles. This is illustrated by Fig. 17. The video hardware composes the tiles on the fly in each frame or line instead of keeping it in memory. This approach made it possible to reuse frequently occurring tiles, such as the sky, empty space and other repeating patterns. As an example, in the Sega Genesis, the screen was composed of two main tile planes and one auxiliary plane. The main planes could hold 4096 tiles each. Each tile was 8x8 pixels and video memory could hold approximately 1300-1800 distinct tiles at a time. Each tile could be displayed in one of four palette lines, further promoting reuse of tiles. Fig. 18 shows the planes that the screenshot in Fig. 17a is composed of. 2.4.3 Sprites Apart from backgrounds and foregrounds, objects (such as the player character, enemies and items) were a common occurrence in video games. Unlike backgrounds and foregrounds which typically fit well into tile-based graphics, objects could move in arbitrary ways, making tiling difficult. Because of this, consoles used to incorporate sprites into their video graphics hardware. 19 (a) Frame (b) Tiles in video memory Figure 17: Screenshot of the game Ristar and tiles in video memory at the time Generally, sprites are bitmaps which can be moved on the screen without altering the data defining the rest of the screen. This introduced considerable space and bandwidth savings, as instead of modifying the entire a↵ected area of the screen every time an object was to be drawn, the game only had to send a small sprite frame to the graphics hardware. This was useful for animated objects as only the sprite frame had to be changed in an animation. Fig. 19 shows distinct frames of a sprite that appears in the frame in Fig. 17a. Implementing sprites in the hardware was an opportunity to relieve software from doing some work that is common to working with sprites. One such example is scaling and rotation, which was implemented on the Game Boy Advance system. A common feature was to perform collision detection, as the spriting hardware already had all the sprite position information. An overview of sprite hardware capabilities for several systems is shown in Fig. 20 (a) Foreground plane (b) Background plane (c) Sprites plane Figure 18: Planes of a single frame on the Sega Genesis Figure 19: Sprite frames for an enemy in the game Ristar 20. Sprites on screen Sprites on line Max sprite size Colors Scaling Rotation Collision detection Game Boy Advance 128 128 64x64 15, 255 Yes Yes - Neo Geo 384 96 16x512 15 Shrinking - Sega Genesis 80 20 32x32 15 Yes SNES 128 34 64x64 15 Limited Limited Yes Figure 20: Sprite hardware capabilities for several systems 2.4.4 Vector displays A vector display was a display device in which the image was composed of drawn lines instead of a grid of pixels. The electron beam would follow an 21 arbitrary path tracing the connected lines, instead of following the standard horizontal raster path. To utilize this kind of display, video hardware would send commands directly to the electron beam, instead of image frames. The video game console Vectrex was entirely based upon this concept. The popular game Asteroid used vector graphics, as shown in Fig. 21. Figure 21: Screenshot of the game Asteroids 2.4.5 Image scaling Old video game consoles ran at much lower resolutions than screens today are capable of displaying. It is therefore necessary to upscale the image, which is a non-trivial task. The naı̈ve way of upscaling an image is to perform nearest-neighbor interpolation on the pixels. However this method is not very-well suited to the task, as its result looks jagged. There are a number of pixel art scaling algorithms designed specifically to address this problem. Fig. 22 compares several of them. The first known algorithm is EPX, developed to port LucasArts games to early Macintosh computers, which had about double the resolution of the 22 (a) Nearest-neighbor (b) EPX (d) hq4x (c) Super 2xSaI (e) Kopf-Lischinski Figure 22: Comparison of image scaling algorithms performed on a dolphin sprite from the game Super Mario World (4x magnification) [10] original system. The algorithm replaces every pixel by a 2x2 block of the same color. Then, set the top left pixel in the 2x2 to the color of the left and upper neighbors of the pixels if their colors match and do the same for every pixel in the 2x2 block. Further magnification can be obtained by running the algorithm several times. A refinement on EPX is the Eagle algorithm, which works similarly but also considers pixels adjacent diagonally. A number of similar algorithms exist which di↵er in the way they calculate the resulting pixels but are similar in principle, such as 2xSaI [5], Super 2xSaI and Super Eagle. All of these algorithms su↵er from locality issues, as they only consider adjacent pixels. A more sophisticated approach is used in the hqnx family of algorithms [19]. First, the color di↵erence is calculated between the magnified pixel and its 8 nearest neighbors. Then, that di↵erence is compared to a predetermined threshold to determine whether the two pixels are close or distant. Hence, there are 256 possible combinations of the closeness of the neighboring pixels. Second, a lookup table with 256 entries is used, one entry for each combination. Each entry describes how to mix the colors in the resulting image to interpolate pixels of the filtered image. As the most computationally intensive part of the algorithm is generating the lookup table, the algorithm 23 performs very well in real time. Because of that, and because it delivers the best results when compared to other fast algorithms, it is used in numerous emulators, such as bsnes and FCE Ultra. Algorithms in the hqnx family still, however, su↵er from strict locality issues and cannot resolve certain ambiguities. All of the algorithms presented so far su↵ered from locality issues. They also produce bitmaps and only work for predetermined degrees of magnification. To address these problems, the Kopf-Lischinski algorithm was developed[10]. It uses novel way to extract resolution-independent vector images from pixel images. The vector image can then be used to magnify the image by an arbitrary amount without feature degradation. The algorithm resolves features in the input image and converts them into regions with smoothly varying shading that are separated by piecewise-smooth contour curves. Although it produces some of the best-looking results, the algorithm is quite complex and it is uncertain how it would perform in a real-time setting, as it has never been used in an emulator to date. 2.5 2.5.1 Sound emulation Sampling Sound hardware typically processes digital audio and then performs digitalto-analog conversion and outputs the analog signal. Sound waves are continuous signals and to process them digitally it is necessary to convert sound waves to a discrete signal. A process called sampling takes measurements of a continuous signal at regular intervals and produces a set of digital samples, as shown in Fig. 23. An example sample is shown in Fig. 24. The simplest sound processing was to store samples and use them directly to generate sound. The Nyquist sampling theorem states that it suffices that 24 Figure 23: Sampling of a simple sine wave. Samples are the black dots. %"#$ %" !"#$ " $" &" '" (" )"" )$" )&" *+,-%./0 &"#"% &" !"#"% !"#"$ " "#"$ "#"' "#"( "#") "#% *+,-&./0 Figure 24: Sample of rain, thunder and chirping birds and a 0.1 second excerpt of the sample the sampling frequency is at least twice the wave’s frequency for the analog signal to be completely determined by the sampling. Because the higher end of the range of human hearing is approximately 20kHz and for technical reasons, sound is typically sampled at 44.1 kHz, a bit above double 20kHz. This means that to store a single second of 8-bit mono sound, 44100 bytes are needed, which was quite prohibitive when every single kilobyte mattered. As with graphics, space and bandwidth concerns meant chips for generating sound on-the-fly had to be developed. One type was the programmable sound generator, which generated sound waves by synthesizing several basic waveforms and combining them. Often PSGs would have a noise generator, to generate explosion and percussion sounds. An example is the SN76489 25 chip used in the Sega Genesis. As PSGs needed very few bits of input data to operate, they were used in the years before memory became large and a↵ordable enough to store sound samples. The downside of PSGs was that the sound generated was rather bland. 2.5.2 Frequency modulation A very popular technique called frequency modulation [2] was used to generate better sounding audio. The equation for a frequency-modulated wave of peak amplitude A is F = A sin(!c t + If (t)) An arbitrary modulating function f (t) is multiplied by the modulation (change) index I and added to the angle (!c ) of the carrier sine wave. In the case that the modulating wave is also a sine wave, the equation becomes F = A sin(!c t + I sin !m t) To create realistic sounding instruments, operators are used, each possessing a frequency, an envelope and the capability to modulate its input using the frequency and envelope. Operators are arranged in circuits. Several such circuits are shown in Fig. 25. Fig. 26 shows which circuits are well-suited for which use. To simulate chords, multiple FM channels are used. This method was so pervasive in the 80s and early 90s that the patent Stanford licensed to Yamaha for the method generated over $20 million for the university in licensing fees alone before it expired in 1994, making it the 26 second most lucrative licensing agreement in Stanford’s history to that date [22]. An example of a frequency modulation synthesis chip was the Yamaha YM2612 which was used in the Sega Genesis and a number of arcade systems. It was capable of six concurrent FM channels, using four operators per channel (as shown in Fig. 25). One of the channels could be set to play digital samples instead [14]. 1 2 1 2 2 1 3 3 4 4 4 4 (a) (b) (c) (d) 1 2 3 (f ) 3 1 4 1 3 2 3 1 3 2 4 (e) 4 2 1 (g) 2 3 (h) Figure 25: FM operator circuits in the YM2612 chip [14] 27 4 Circuit Circuit Circuit Circuit Circuit Circuit Circuit Circuit (a) (b) (c) (d) (e) (f) (g) (h) distortion guitar, bass harp, PSG-like sound bass, electric guitar, brass, piano, woods strings, folk guitar, chimes flute, bells, chorus, bass drum, snare drum brass, organ xylophone, vibraphone, snare drum, tom-tom pipe organ Figure 26: FM operator circuits in Fig. 25 and what instruments they can be used to mimic 2.5.3 Aliasing Typically, in sound hardware, the generated sound was fed to the speakers. An emulator, however, simulates the generation of sound and then stores the result in a sample. This results in aliasing, as harmonics above the Nyquist frequency of the sample (half the sample rate) will fold into the audible range, resulting in rough-sounding aliases. The solution to this problem is to perform band-limited synthesis [7]. 2.6 Documentation As discussed before, in order to emulate a specific system, the emulator has to emulate all of the components and also the system itself, therefore information is needed on how they operate. For publicly available components such as most CPUs and popular video and sound chips, there is usually ample technical documentation available from the manufacturer. However for proprietary components (such as the lockout chip on the Nintendo SNES), peripherals and the inner workings of the system, the only primary documentation is usually confidential and not always possible to find. Therefore reverse engineering is sometimes necessary, that is, discovering how the system works by taking it apart and analyzing its workings. 28 Another source of information is the source code of many open source emulators, particularly the MAME project [30], whose stated goal is to be a reference to the inner workings of various arcade systems, and its sister project, MESS [36] which does the same for console systems. Although the licenses for the projects are incompatible with commonly used free software and open source licenses and therefore their source code cannot be mixed with commonly-licensed code, they serve as secondary documentation for systems they emulate. 3 The Kiwi emulator Using the information I gathered about emulation, I set about to code my own emulator. The result is the Kiwi emulator, which emulates most of the features of the Sega Genesis console and runs most of the games I tested to a high degree of accuracy. It has been tested on several Linux and OS X machines. 3.1 Architecture Figure 3 in Chapter 2 showed a diagram of the Sega Genesis architecture. Kiwi emulates everything in the diagram except for sound and the Z80 CPU which is mainly used for sound. The architecture of the emulator, as seen in Fig. 27, consists of several parts. The main kiwi.py module interprets keyboard input and passes it to the input module, which emulates the Sega Genesis joypad controllers. The vdp module handles the emulation of the Sega Genesis Video Display Processor (VDP), as well as rendering the raw screen output. The CPU core is emulated by Musashi [34], an open-source library used in several popular 29 Sega Genesis emulators, including DGen [24] and Genesis Plus [28]. Musashi is an interpreting emulator of the Motorola 68000 CPU with accurate cycle timing. vdp.c VDP State VDP Registers VRAM CRAM VSRAM hqx library vdp_read/write vdp_render_sprites m68k_execute scale_epx input.c ROM read/write_memory frame scale_nearest scale_filter megadrive.c RAM CPU State m68k_set_irq vdp_render_bg scale.c scale_hqx vdp_render_line DMA Musashi 68K library hqx_32 kiwi.py Pad State pad_press_button io_read/write_memory pad_release_button Key Events frame blit_screen Keyboard Screen Figure 27: Kiwi emulator architecture The megadrive module is responsible for the high-level emulation of the Sega Genesis architecture. Every 1/60th of a frame, kiwi.py calls on the megadrive module to emulate a single frame. Emulating a frame consists of rendering the screen line by line and executing the CPU core for a short time in between each line. Having rendered the frame line by line, the megadrive module returns the single frame image to kiwi.py. In case the user has chosen a resolution higher than the native 320x240, the scale module performs the necessary upscaling of the image. The user can choose from three of the image scaling algorithms seen in Fig. 22: nearest-neighbor scaling, the EPX algorithm and the hqx algorithm. Hqx is provided by the open-source library hqx [35]. I decided to write the emulator in C with a bit of Python. C was used for 30 GUI the emulation part of the program because of C’s lack of CPU and memory overhead. Experiments with writing some parts of the emulation in Python showed that Python is unnecessarily slow for the task. In the end Python was used for the graphical user interface of the program. 3.2 Challenges While writing Kiwi, I encountered some interesting cases of trade-o↵s between speed and accuracy as well as other curiosities. 3.2.1 Line rendering The Sega Genesis outputs graphics based on a color palette consisting of 64 9-bit colors. So under normal circumstances only 64 colors could be drawn on the screen at the same time, however in certain games a clever trick was used that changed the palette mid-frame. For example, one-third of the screen would be drawn using one set of 64 colors and then the palette would be changed and a di↵erent set of 64 colors was put in its place, thereby allowing the remaining two-thirds of the screen to be drawn using the di↵erent set of colors. This means an emulator rendering the screen frame by frame would be inadequate, because it would miss the mid-frame palette changing e↵ect. Fig. 28 shows a water level in the game Sonic the Hedgehog drawn correctly and incorrectly side by side. The game uses the trick by drawing a part of the screen that is above water with one palette and drawing the underwater part with another. The usual, incorrect way to emulate a Sega Genesis frame would be to run the CPUs for 1/60th of a second and then render the frame. To emulate 31 Figure 28: A screenshot of a water level in the game Sonic the Hedgehog. The screen on the left uses correct line rendering. The screen on the right uses incorrect frame rendering. the trick properly, my emulator renders each line of the screen separately and after each line runs the CPUs for 1/13340th of a second (13440 is 60 times the number of lines in the screen, which in NTSC mode is 224). 3.2.2 Cycle accuracy The main CPU used in the Sega Genesis, the Motorola 68000, does not execute all instructions equally fast. For example, a multiplication instruction might take 64 clock cycles, while a subtraction instruction might only take 8 clock cycles to execute. Because some games depend on these di↵erences, it is necessary to emulate correct cycle timing of CPU instructions, instead of just emulating their functionality. An example of this is a stage in the game Sonic the Hedgehog 2, as seen in Fig. 29, which plays too fast when precise cycle timing is disabled. 32 Figure 29: A screenshot of a level in the game Sonic the Hedgehog 2 which requires precise instruction timing. 3.2.3 Video game bugs Sometimes video game programmers make code mistakes which go unnoticed for one reason or another. One such bug persisted in a game called Bass Masters Classic Pro Edition. The Sega Genesis video processor (VDP) normally operates in what is called Mode 5. For backwards compatibility, the video chip included functionality for a mode used in earlier hardware, called Mode 4. The game Bass Masters enters the wrong Mode 4 at one point by mistake, sets the screen resolution, and enters Mode 5 back again. The Mode 4 resolution setting is ignored in real hardware and if this is not emulated (and instead, the resolution setting sticks), the logo screen is garbled, as illustrated in Fig. 30. A similar thing happened when Microsoft was developing Windows 95. The game SimCity would not work in beta versions of Windows 95. Microsoft tracked down the bug and it turned out SimCity was reading memory after 33 (a) Incorrect emulation (b) Correct emulation Figure 30: A screenshot from the game Bass Masters Classic Pro Edition freeing its allocation. On modern operating systems, this results in an error, but on Windows 3.x, the operating system preceding Windows 95, this worked fine as the memory was never used again. So the Windows 95 programmers put in code that looks for SimCity and puts the memory allocator in a special mode that doesn’t free the memory right away [17]. 34 4 4.1 References Information [1] Neil Bradley. Ms Pacman lives! (Yahoo Groups). URL http: //tech.groups.yahoo.com/group/staticrecompilers/ message/287. [2] John Chowning. The Synthesis of Complex Audio Spectra by Means of Frequency Modulation. Journal of the Audio Engineering Society, 21 (7):526–534, 1973. [3] Oracle Corporation. Java SE HotSpot at a Glance. URL http:// java.sun.com/products/hotspot/. [4] Victor Moya del Barrio. Study of the techniques for emulation programming. Master’s thesis, FIB UPC, June 2001. [5] Derek Liauw Kie Fa. 2xSaI: The advanced 2x Scale and Interpolation Engine. URL http://vdnoort.home.xs4all.nl/emulation/ 2xsai/. [6] Marat Fayzullin. How to write a computer emulator. URL http: //fms.komkon.org/EMUL8/HOWTO.html. [7] Shay Green. Band-Limited Sound Synthesis. slack.net/˜ant/bl-synth/. URL http://www. [8] LLVM Developer Group. The LLVM Compiler Infrastracture. URL http://llvm.org/. [9] Motorola Inc. M68000 Family Programmer’s Reference Manual. Prentice Hall, 1992. 35 [10] Johannes Kopf and Dani Lischinski. Depixelizing Pixel Art. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2011), 30(4):99:1 – 99:8, 2011. [11] Charles MacDonald. Sega Genesis VDP documentation, 2000. URL http://cgfm2.emuviews.com/txt/genvdp.txt. [12] Anders Montonen. antime’s feeble Sega Saturn page. URL http: //koti.kapsi.fi/˜antime/sega/index.html. [13] Mark Probst. Fast Machine-Adaptable Dynamic Binary Translation, 2002. [14] Roger Sanders. New Documentation: An authorative reference on the YM2612. URL http://gendev.spritesmind.net/forum/ viewtopic.php?t=386. [15] Joshua H. Sha↵er. A Performance Evaluation of Operating System Emulators. Master’s thesis, Buchnell University, May 2004. [16] David Sharp. Tarmac, Dynamically Recompiling ARM Emulator, 2001. [17] Joel Spolsky. Strategy Letter II: Chicken and Egg Problems, May 2000. URL http://www.joelonsoftware.com/articles/ fog0000000054.html. [18] Michael Steil. Dynamic Re-compilation of Binary RISC Code for CISC Architectures. PhD thesis, Technische Universitat Munchen Institut fur Informatik, September 2004. [19] Maxim Stepin. hq2x, hq3x, hq4x filters. hiend3d.com/demos.html. [20] Bart Trzynadlowski. February 2002. URL http://www. Sega Genesis Emulator Save State Reference, [21] Bart Trzynadlowski. 68000 Undocumented Behavior Notes, May 2003. 36 [22] Stanford University. Music synthesis approaches sound quality of real instruments, June 1994. URL http://news.stanford.edu/pr/ 94/940607Arc4222.html. 4.2 Emulators [23] Stéphane Dallongeville. Gens emulator. URL http://www.gens. me/. [24] Dave. The DGen emulator. URL http://dgen.sourceforge.net. [25] Aaron Giles. Aaron’s MAME Memories. aarongiles.com/mamemem/index.html. URL http://www. [26] Greg James, Barry Silverman, and Brian SIlverman. Visual 6502 in JavaScript. URL http://www.visual6502.org/JSSim/index. html. [27] Greg James, Barry Silverman, and Brian SIlverman. Visual Transistorlevel Simulation of the 6502 CPU and other chips. URL http://www. visual6502.org/. [28] Charles MacDonald and Eke-Eke. The Genesis Plus GX emulator. URL http://code.google.com/p/genplus-gx/. [29] RealityMan and Epsilon. UltraHLE. URL http://www.emuunlim. com/UltraHLE/. [30] Nicola Salmoria and the MAME Team. MAME, Multiple Arcade Machine Emulator. URL http://mamedev.org/. [31] Bernd Schmidt. UAE Amiga Emulator. amigaemulator.org/. URL http://www. [32] Steve Snake. The Kega Fusion emulator. URL http://www. eidolons-inn.net/tiki-index.php?page=Kega. 37 [33] Michael Steil, Orlando Bassotto, and Gianluca Guida et al. Libcpu. URL http://www.libcpu.org/wiki/Main_Page. [34] Karl Stenerud. The Musashi 68000 emulator. URL http://www. zophar.net/linux/68000/musashi.html. [35] Maxim Stepin and Cameron Zemek. The hqx scaling library. URL http://code.google.com/p/hqx/. [36] MESS team. MESS, Multi Emulator Super System. URL http:// www.mess.org/. [37] The VICE team. viceteam.org/. The VICE Emulator. URL http://www. [38] VBA Team. VisualBoyAdvance emulator. URL http://vba.ngemu. com/. 38