Arbitrary Code Execution in Pokemon Red

Posted on 5/26/2015

The Gameboy Color is a nifty piece of hardware. After reading about several projects related to the original Pokemon games, I decided to give it a go myself. I dusted off my old Gameboy Color and ordered a copy of Pokemon Red off eBay and set off to do something.

While I waited, I began to sift through the awesome Pokemon R/B/Y disassembly project, and found the section dedicated to the Cable Club. The Cable Club is the room inside Pokemon Centers where you can trade Pokemon with another person using a link cable. This part intruiged me since data is sent over the link cable and curiosity lead me to search for some way to manipulate the data other than spoofing a trade. Using the BGB emulator and its documentation on its link cable protocol, I slapped together a Python script to interact with the Pokemon ROM.

The Exchange

The initial communication between Gameboys is outlined here pretty well. It's not very interesting or relevant to the exploit, but basically the first Gameboy to talk to the Cable Club lady is the master (the emulator or the actual Gameboy later, referred to as Player 1 from here on out) while the other is the slave (Player 2, the script/Arduino). The menu options are then transmitted back and forth repeatedly as players choose what to do. Once the trade option is selected you are taken to a room to begin sending the other player your Pokemon data. It is sent in three parts:

Random seed value: 10 bytes
Pokemon party structure: 415 bytes
Patch list: 196 bytes

The random seed value is used so random events have the same outcome on both Gameboys. The pokemon party data section includes the other player's name, party (species of Pokemon, nicknames, HP, attacks etc.), as well as the original trainer's names. And finally, since the byte 0xFE has a special meaning when received as serial data (it is the designated "no data" byte, along with 0xFD being the preamble byte), the patch list is simply the offset of bytes in the party that should have the byte 0xFE. When these are transmitted, the sections of data are preceded by an arbitray number of 0xFD bytes.

The party structure is organized like so: (credit to Bulbapedia)

Name Length Comment
Player Name 11 bytes Terminated with 0x50
Size of Pokemon Party 1 byte Normally max size of 6
Pokemon Species IDs 7 bytes List is terminated with 0xFF, padded with 0x00 if less than 6 Pokemon
Pokemon #1 44 bytes First Pokemon's data
... 44 bytes
Pokemon #6 44 bytes Last Pokemon's data
Nicknames 66 bytes 11 bytes each, terminated with 0x50
Original Trainer Names 66 bytes 11 bytes each, terminated with 0x50
??? 3 bytes Unknown

When this party data is sent over the link cable, the Player 2's name is stored separately in memory at address 0xD887 and the rest of the data at 0xD89C:

0xD887 | 80 - First character of the Player 2's name
...
0xD89C | 06 - Number of Pokemon in Player 2's party
0xD98D | 01 - Species ID of 1st Pokemon
0xD98E | 02 - Species ID of 2nd Pokemon
0xD98F | 03 - Species ID of 3rd Pokemon
0xD990 | 04 - Species ID of 4th Pokemon
0xD991 | 05 - Species ID of 5th Pokemon
0xD992 | 06 - Species ID of 6th Pokemon
0xD993 | FF - List terminator
0xD994 | 01 - Start of 1st Pokemon data
...

The Buffer Overflow

When you finally enter the trading room, each of the player's original Pokemon names are printed to the screen (not their nicknames for some reason). The way older video game consoles displayed some graphics is by using tiles. The video buffer is simply an array of tile IDs that are to be displayed on screen. This applies to the text on screen as well. Each letter is its own tile, and as a result the function to write strings to the screen simply writes bytes to the corresponding area of video buffer starting at 0xC4A0 (1 byte per tile, 20x16 in total):

...
	hlCoord 2, 1	; $C4A0
	ld de, wPartySpecies ; $D89D
	call TradeCenter_PrintPartyListNames
...

TradeCenter_PrintPartyListNames:
	ld c, $0
.loop
	ld a, [de]
	cp $ff
	ret z
	ld [wd11e], a
	push bc
	push hl
	push de
	push hl
	ld a, c
	ld [$ff95], a
	call GetMonName
	pop hl
	call PlaceString
	pop de
	inc de
	pop hl
	ld bc, 20
	add hl, bc
	pop bc
	inc c
	jr .loop

Or in C, roughly:

TradeCenter_PrintPartyListNames(0xD89D, 0xC4A0);

void TradeCenter_PrintPartyListNames(char* start_ids, char* dest) {
	char* current_id = start_ids;
	while (*current_id != 0xFF) {
		char* name_str = GetMonName(*current_id); // Returns pointer to a string
		PlaceString(name_str, dest); // Essentially strcpy, but 0x50 terminated
		dest += 20; // Move to next line in buffer
	}
}

The TradeCenter_PrintPartyListNames function is passed the address of Player 2's first Pokemon ID and begins to write all of the names to the video buffer. But the way the function is written, a Pokemon's name can be written outside the intended area of memory if the byte 0xFF is not encountered within 7 bytes. Past that, we can write outside of the video buffer all together and even further down memory to where the stack is located (0xD000-0xDFFF). What good is writing Pokemon names to the stack though?

GetMonName:: ; 2f9e (0:2f9e)
	push hl
	ld a,[H_LOADEDROMBANK]
	push af
	ld a,BANK(MonsterNames) ; 07
	ld [H_LOADEDROMBANK],a
	ld [$2000],a
	ld a,[wd11e]
	dec a
	ld hl,MonsterNames ; 421E
	ld c,10
	ld b,0
	call AddNTimes
	ld de,wcd6d
	push de
	ld bc,10
	call CopyData
	ld hl,wcd77
	ld [hl], "@"
	pop de
	pop af
	ld [H_LOADEDROMBANK],a
	ld [$2000],a
	pop hl
	ret

C pseudo-code:

char* GetMonName(char id) {

	char* buffer, current, i;

	// Switch to the bank that stores the Pokemon names
	// along with other data...
	switchBank(7);

	// Names start at 421E and contain 10 bytes each
	current = 0x421E + (id * 10);
	buffer  = 0xCD77; // buffer

	// CopyData, essentially memcpy
	for (i = 0; i < 10; i++) {
		buffer[i] = current[i];
	}

	buffer[i] = 0x50; // Make sure string is terminated

	return buffer;
}

The GetMonName function looks up the Pokemon's name in ROM bank 7 (one of the extra sections of memory of the Gameboy) and copies it into a buffer. There are only 151 pokemon in the game and it makes sense that there should only be that many names too. Space is quite limited in Gameboy games so there is some unrelated code/data directly after the table of names:

This data can be copied to the buffer just like a Pokemon name if we pass a value greater than 151 for a species ID. We can now wreck havoc on the stack and write whatever 10 byte sequence we find in the table to certain parts of the stack. We'll now try to overwrite something interesting.

Setting a breakpoint inside the PlaceString function, we see that the return address stored in the stack is at address 0xDFDD. When the PlaceString function finishes and executes its RET instruction, it will jump to this memory address and continue executing from there. Could we possibly write to this address and have the Gameboy execute other code elsewhere?

I choose you, @(*#&!

TradeCenter_PrintPartyListNames(0xD89D, 0xC4A0);

void TradeCenter_PrintPartyListNames(char* start_ids, char* dest) {
	char* current_id = start_ids;
	while (*current_id != 0xFF) {
		char* name_str = GetMonName(*current_id); // Returns pointer to a string
		PlaceString(name_str, dest); // Essentially strcpy, but 0x50 terminated
		dest += 20; // Move to next line in buffer
	}
}

Taking a look at this code again, we can write up to 10 bytes every 20 bytes starting at 0xC4A0. Doing some math (20n + 0xC4A0), we see that we can indeed start writing at address 0xDFD6 which is 7 bytes before the return address (n = 346 which is less than the total party size of 415). Perfect! Now, what are the chances that an address to some place useful is located in the unrelated data past the Pokemon names, in little endian format no less?

...
0xcc   4A 23 23 78 22 79 22 C9 17 AD
0xcd   5D 22 50 CD 3C 3C 21 26 D1 CB
0xce   EE 21 96 D7 CB 86 21 A3 D7 CB
0xcf   8E 21 34 4A FA 39 D6 C3 97 3D
0xd0   38 4A 73 4A 06 2B CD 93 34 C0
...

(left column is the species ID and the corresponding "name")

The address 0xD7A3 is at the right location to be injected into the return value. Player 2's name just so happens to be located at 0xD887, about 230 bytes after the potential return value. By sheer luck, the portion of memory between just happens to be filled with mostly 0's (the nop opcode!) and makes a nice nop sled for us! What are the odds?

There is still one issue. To get this far down the stack we spray Pokemon names all over the place (since we have to print 346 of them to get this far) and potentially over important parts of the game. Let's do something about that.

...
0xe1   49 3C C3 D7 24 17 BF 55 23 50
0xe2   17 43 56 23 0B 50 17 52 56 23
0xe3   50 17 7C 56 23 50 17 9F 56 23
0xe4   50 17 20 57 23 50 05 06 05 C0
0xe5   41 82 50 0E 4B 00 0A 54 FA 4B
0xe6   D7 CB 77 C4 76 50 3E 01 EA 0C
...

(left column is the species ID and the corresponding "name")

There are two "names" that begin with 0x50, the string terminating byte. Calling PlaceName on these means that nothing will be written to memory for that species ID. At this point we have enough to write a bootstrapping program to execute arbitary code.

The Payload

By sending Player 2's name as assembled Z80 code, we can jump to any part of memory (and fix the stack if we'd like to return to the game). And as a consequence of our crafted party that lacks any 0xFE bytes, the patch list will be mostly empty, save for two 0xFF terminating bytes at the beginning. We can then stash our shell code here. We include a jump instruction to the patch list in Player 2's name, C3 D6 C5 (JP $C5D6) and our party looks like this:

# Fix stack to previous state, jump to 0xc5d6, the patch list
name         | f8 00 36 fd 01 3e 58 c5 c3 d6 c5
party size   | 06                   # not actualy used, can be any value
species IDs  | 15 15 15 15 15 15    # arbitrary species IDs, in this case Mew's
pokemon data | e3 (346x) ... ce ff 00 00 00 ...
# crafted party structure, terminated with 0xff byte and followed by 0's

For the shellcode, there are a couple of assemblers for the Gameboy's Z80 like processor, but I found that only WLA DX allows you to easily assemble code for the WRAM section of memory (0xC000-0xCFFF). I have put together a simple animated hello world proof of concept with a simple build file to put it all together. There are only 194 bytes available in the patch list for our shell code but its enough to do some interesting things. One thing to remember is that we can't send any 0xFE bytes in the patch list or party, so any instructions using that byte can't be sent (the compare immediate value instruction is one of them).

Now for Real

So far we've been using an emulator for this hack, but wouldn't it be much more satisfying to see this happen on actual hardware?

The Gameboy uses a primitive serial protocol for the link cable that runs at 5V. The four main connections are serial in, serial out, clock and ground. The master Gameboy initiates the serial transfer and a byte is simultaneously pushed out as one is read in. This can be implemented easily with a microcontroller and has been done so already for the Arduino. Building off that project, it was easily modified to send the patch list as well. All that was required was cut up a Link Cable that I got off eBay for $2, no resistors or anything much else. I recommend you break the plastic off one of the connectors and then pry the metal to expose the easily connectable prongs.

Conclusion

I didn't have as much time as I'd like to explore possibilites of taking this further, but having done nothing like this before this project was quite an adventure. It can be extended to send over even larger programs as a payload, as there are portions of memory that can be overwritten (without side effects, like Player 2's party data) by sending a bootstrap program that requests more data over serial. It can be used as a "GameShark" that allows you to modify parts of the game (more money, add arbitary Pokemon to your party, etc). You could dump your save file and keep a backup on your computer. Or even dump the whole ROM!

Even though this project has been written for Pokemon Red, I'm sure it could run on Pokemon Blue, maybe even Yellow. If you do anything interesting, I'd like to hear about it.