About Donkey Kong Land 3 (and possibly more)

Share and discuss all facets of DKC ROM hacking...

About Donkey Kong Land 3 (and possibly more)

Postby Raccoon Sam » July 15th, 2012, 1:32 am

I've looked into DKL3 in the past but only with relatively low success. I know where Red Dwarf's level data starts and pretty sure where it ends but when it comes to editing it, I've always been kinda stumped.
Wasn't until a few days ago when I think I made some serious progress. Here's what I've got.

First of all, if you go to 0x5C001, you'll find three bytes. $97, which is the width of level in 32x32 blocks. $0C, which is the height of level in 32x32 blocks (+2). And finally $BF, which has something to do with the sprite array. More on that later.
After there bytes, the level data begins. It is compressed in a way that I have yet to FULLY document, but I've made some changes in the level already. Editing the first byte, for example, will change one of the cloud blocks to something else. I tried $00, $01, $02, $03, all the way to $7F, screenshotting my way to a full tileset rip. Observe:
Image

Each level has only $7F blocks to choose from. If the tile to use goes above $7F, the level breaks, but only in the ROM. What I did next was compare the level data dumped to a KiGB (possibly every other GBA emulator too) save state, and found that one byte consists of STTT TTTT bits, where T means tile to use ($00-$7F) and S being the sprite flag. If the bit is set, a block is printed but with a sprite on top of it. So, for example, $2E is the rooftop block, but $AE is the rooftop block WITH a sprite on top of it.

This is the level data in the FRZ (0x2660, $716 bytes long) overlaid on top of Blaziken's fantastic rip of the map:
Spoiler!
Image

If you pay attention to the blocks with sprites, you'll get the idea behind the sprite flag bit.

So, a thing worth mentioning is that the whole level with sprites (and possibly even bonus levels) are loaded all at once, so in theory, it could be very possible to make a program that reads the Save state of the level as input and spits out a complete map of the level as PNG. Automated mapping in a way.


But in the end, I'm pretty confused. How does the game transfer the level data from the ROM into the complete block array seen in the save state? There are similarities, but I've never dealt with compression algorithms (except RLE) unless there's an automated tool to do it. Although I doubt it, it might be the same/similar compression as Donkey Kong Country 3. I know Kirby Super Star (SNES game) uses the same compression as Kirby's Dream Land 2 (Game Boy game) and Kirby's Adventure (NES game).

If anyone here has binary file comparison software, could he/she look into the compressed and uncompressed level files for analysis?
Here is the uncompressed file from the FRZ and here is the compressed file, from the ROM.
Trailblazer
Bananas received 35
Posts: 265
Joined: 2008

Re: About Donkey Kong Land 3 (and possibly more)

Postby Tonberry2000 » July 15th, 2012, 3:29 am

You leave a wake of thoroughly impressed people wherever you go.
Trailblazer
Bananas received 30
Posts: 292
Joined: 2011

Re: About Donkey Kong Land 3 (and possibly more)

Postby Blaziken257 » July 15th, 2012, 3:32 am

I've actually tried to look into this about a half a year ago, with some success. Since the levels are compressed in ROM, it's almost impossible to figure out the algorithm without assembly knowledge, and naturally, using a disassembler. bgb has a great disassembler/debugger, although I don't know if you can get it for Mac. (Is there an emulator for Mac that has a disassembler and debugger?) In addition, when looking at assembly code, it also helps to know how the GB hardware works, namely, the instructions (two useful links I found are here and here. This page also helps to know the hardware in general -- knowing how ROM and RAM work can help.

So I have some notes, full of pseudo-code, although since I spent lots of time trying to reverse engineer the compression algorithm, it isn't as clean as it could be. Also, as you can see, there are lots of control codes in this algorithm, and each one takes a lot of time for me to figure out, that I haven't figured out all the cases yet. So it's incomplete as well.

In addition, I didn't know that the most significant bit of a 32x32 tile makes a sprite appear, so it's not mentioned in my notes. It's good that you figured that out, though.

So... here is what I wrote, hopefully it makes some sense. I'll clean it up later, and I'll definitely finish all the compression cases later too.

Code: Select all
This is the Japanese version of the game!

Terminology used here: Square brackets = ROM address; angle brackets = RAM address.

ASM code (at least from what I documented here) starts at [3130] in ROM

Red Wharf example below:

First, use the level widths and height to grab the pointers for each row. Each row is 32 pixels in height, and has the data for each 32x32 tile.

[5C001] -> <C5EB>                                               //<C5EB> = 97 (Level width)
[5C002] -> <C5EC>                                               //<C5EC> = 0C (Level height)
for i = 0 to <C5EC> - 1:
    <C600:C601>+2i = C600 + (<C5EC>+4)*2 + <C5EB> * i           //<C600:C601> = C620
                                                                //<C602:C603> = C6B7
                                                                //...
                                                                //<C616:C617> = CC9D
In English: Make x big-endian pointers, where x is the value of <C5EC>. First one should be at C600 + (<C5EC>+4)*2, then increment each by x.
These seem to represent the tiles in each row.
Notice that in this Red Wharf example, the pointers increase by 97 every time since that is the level width.
Create four more pointers, fill them with value of last one.    //<C618:C619> = CC9D
The game appears to fill four rows off-screen, which use the    //<C61A:C61B> = CC9D
same tiles as the last on-screen row??                          //<C61C:C61D> = CC9D
                                                                //<C61E:C61F> = CC9D
                                                               
Bottom line: Use VBA's memory viewer to view RAM starting at <C600>. You'll see some big-endian pointers, followed by the level data!!

-----------------------------------------------------------------------------------------------------------------------

Decompression:

[Note: This is incomplete!!!]

Documented ASM code starts at [3A0B]. bgb's disassembler/debugger REALLY helps here!!!

Start reading nibbles at [5C003] for Red Wharf. (Yes, only ONE nibble at a time, that's half a byte)
Function to read nibbles takes place at [3B34], so in the ASM code, you'll see the "call 3B34" instruction a LOT

if nibble < C: (comparison takes place at [3A13])
    Treat as uncompressed nibble, read next one??
    If nibble = B, and if next nibble is E or F, this is a special case (see below)

Magic hex values: C, D, E, F, BE, and BF (B0, B1, B2, ..., BD doesn't do anything special)
    When these are read, some weird compression occurs. See below.
   
BE: (ASM code at [3A27])
    Get two nibbles -> ab
    Get another nibble -> c
    for i = 0; i < c + 2; i++
    {
        Output ab
        ab += 01
    }
   
    (Essentially, it outputs ab c+2 times, but ab increases by 1 each time it is output. This is useful for long structures all with different 16x16 tiles.
    For example, a nibble string of BE, followed by 4A3 would output 4A 4B 4C 4D 4E to RAM.)
   
BF: (ASM code at [3A44])
    Get two nibbles -> ab
    Get two more nibbles -> cd
    Get another nibble -> e
    for i = 0; i < e + 2; i++:
        output ab, cd
   
    (Essentially, it outputs ab and cd e+2 times. Useful when there are repeated 64x32-pixel structures...
    For example, a nibble string of BF, followed by 404B2 would output 40 4B 40 4B 40 4B 40 4B to RAM.)
   
 C: (ASM code at [3A75])
    Get two nibbles -> ab
    If ab = odd: (ASM code at [3A89])
        Unknown
    If ab = even: (ASM code at [3A8F])
    {
        rew = ab/2
        Get another nibble -> c
        if c = F:
            Unknown
        else:
        {
            for i = 0; i < c+4; i++:
            {
                temp_ptr = cur_ptr - rew
                rew -= 1
                read at temp_ptr
                output 2 nibbles at temp_ptr
            }
        }
    }
       
 D: (ASM code at [3AC2])
 
 E: (ASM code at [3AE1])
       
 F: (ASM code at [3B06])
    Get two nibbles = ab
    Get third nibble = c
    If c >= 8: (ASM code at [3B1E])
        Unknown
    else if c < 8: (ASM code at [3B1A])
    {
        for i = 0; i < c + 3; i++:
            output ab
    }
   
    (When c < 8, outputs the same tile c+3 times. So it can output the same tile anywhere from 3 to 11 times.
    For example, a nibble string of F, followed by 3C3 would output 3C 3C 3C 3C 3C 3C to RAM.)

When bytes are output to RAM, they occur just after the pointers at 0xC600.
Treasure Hunter
Bananas received 108
Posts: 333
Joined: 2008

Re: About Donkey Kong Land 3 (and possibly more)

Postby Raccoon Sam » July 15th, 2012, 3:47 am

I kinda figured that it had to be read nibble by nibble, but had no clue about the magic nibbles.
While I can't understand your pseudocode too well, your commentary is thorough and I understand so much more now. "(When c < 8, outputs the same tile c+3 times. So it can output the same tile anywhere from 3 to 11 times. For example, a nibble string of F, followed by 3C3 would output 3C 3C 3C 3C 3C 3C to RAM.)" was very informative. Thank you!

And no, I have no debugger. Only a hex editor, tile molester and a text editor. :/ GameLad and NO$GMB work through a WINE-like app though, but haven't dealt with them too much. My only experience with assembly is with Geiger's fantastic SNES9X debugger. So thanks for the links!

Though, would it be wiser to ditch the magic nibbles-idea entirely and just expand the ROM so one byte would equal one block, easily editable via hex editor or a program that doesn't have to deal with any of the compression jazz? If that's possible, of course. I've never expanded a GB ROM before but hey, what do I know. Whatever the case, we don't have to keep the ROM size as small as possible like the folks at Rare back then.
Trailblazer
Bananas received 35
Posts: 265
Joined: 2008

Re: About Donkey Kong Land 3 (and possibly more)

Postby Blaziken257 » August 6th, 2012, 5:26 am

Yeah, the magic nibbles make the compression algorithm very hard to figure out, but it does save a lot of space.

In any case, I did more research and figured out all the different cases. I added more examples as well, because I find everything easier to figure out that way. So, here are my new notes:

Code: Select all
This is the Japanese version of the game!

Terminology used here: Square brackets = ROM address; angle brackets = RAM address.

ASM code (at least from what I documented here) starts at [3130] in ROM

Red Wharf example below:

First, use the level widths and height to grab the pointers for each row. Each row is 32 pixels in height, and has the data for each 32x32 tile.

[5C001] -> <C5EB>                                               //<C5EB> = 97 (Level width)
[5C002] -> <C5EC>                                               //<C5EC> = 0C (Level height)
for i = 0 to <C5EC> - 1:
    <C600:C601>+2i = C600 + (<C5EC>+4)*2 + <C5EB> * i           //<C600:C601> = C620
                                                                //<C602:C603> = C6B7
                                                                //...
                                                                //<C616:C617> = CC9D
In English: Make x big-endian pointers, where x is the value of <C5EC>. First one should be at C600 + (<C5EC>+4)*2, then increment each by <C5EB>.
These seem to represent the tiles in each row.
Notice that in this Red Wharf example, the pointers increase by 97 every time since that is the level width.
Create four more pointers, fill them with value of last one.    //<C618:C619> = CC9D
The game appears to fill four rows off-screen, which use the    //<C61A:C61B> = CC9D
same tiles as the last on-screen row??                          //<C61C:C61D> = CC9D
                                                                //<C61E:C61F> = CC9D
                                                               
Bottom line: Use VBA's memory viewer to view RAM starting at <C600>. You'll see some big-endian pointers, followed by the level data!!

-----------------------------------------------------------------------------------------------------------------------

Decompression:

[Note: This is incomplete!!!]

Documented ASM code starts at [3A0B]. bgb's disassembler/debugger REALLY helps here!!!

Start reading nibbles at [5C003] (yes, only ONE nibble at a time)
Function to read nibbles takes place at [3B34], so in the ASM code, you'll see the "call 3B34" instruction a LOT

if nibble < C: (comparison takes place at [3A13])
    Treat as uncompressed nibble, read next one??
    If nibble = B, and if next nibble is E or F, this is a special case (see below)

Magic hex values: C, D, E, F, BE, and BF (B0, B1, B2, ..., BD don't do anything special)
    When these are read, some weird compression occurs. See below.
   
BE: (ASM code at [3A27])
    Get two nibbles -> ab
    Get another nibble -> c
    for i = 0; i < c + 3; i++
    {
        Output ab
        ab += 01
    }
   
    (Essentially, it outputs ab c+3 times, but ab increases by 1 each time it is output. This is useful for long structures all with different 16x16 tiles (like the planks in Red Wharf).
    For example, a nibble string of BE, followed by 4A 3 would output 4A 4B 4C 4D 4E 4F to RAM.)
   
BF: (ASM code at [3A44])
    Get two nibbles -> ab
    Get two more nibbles -> cd
    Get another nibble -> e
    for i = 0; i < e + 2; i++:
        output ab, cd
   
    (Essentially, it outputs ab and cd e+2 times. Useful when there are repeated 64x32-pixel structures...
    For example, a nibble string of BF, followed by 40 4B 2 would output 40 4B 40 4B 40 4B 40 4B to RAM.)
   
 C: (ASM code at [3A75])
    Get two nibbles -> ab
    If ab = odd: (ASM code at [3A89])
    {
        rew = ab/2
        Get another nibble -> c
        rew = rew + c*0x80
    }
    Else if ab = even: (ASM code at [3A8F])
    {
        rew = ab/2
    }
    Get another nibble -> d
    if d = F (ASM code at [3A96]:
    {
       get two nibbles -> ef
      for i = 0; i < ef+4; i++:
        {
            temp_ptr = cur_ptr - rew - 1
            rew -= 1
            read at temp_ptr
            output 2 nibbles at temp_ptr
        }
   }
    else (ASM code at [3AA0]:
    {
        for i = 0; i < d+4; i++:
        {
            temp_ptr = cur_ptr - rew - 1
            rew -= 1
            read at temp_ptr
            output 2 nibbles at temp_ptr
        }
    }
   
    (Examples:
    [5C049] in ROM: C followed by 61 1 3 -- int(0x61/2) + 1 = 0x31 is rewind pointer. 0x61 is odd, so read 1. 1*0x80 = 0x80, so rewind pointer = 0x31+0x80, which is 0xB1. Next, read 3, which means we're going to read 0x3+0x4 = 0x7 bytes.
    If current RAM pointer (for example) is at <C6E3>, then read from 0xC6E3 - 0xB1 = <0xC632>, and read for 0x7 bytes. If <C632>-<C638>, for example, contains 67 68 69 01 01 01 01, then <C6E3>-<C6E9> will also contain 67 68 69 01 01 01 01.
   
    [5C054] in ROM: C followed by 7C 5 -- int(0x7C/2) + 1 = 0x3F is rewind pointer. 0x7C is odd, so do not add anything to rewind pointer. Next, read 5, which means we're going to read 0x5+0x4=0x9 bytes in RAM.
    If current RAM pointer (for example) is at <C701>, then read from 0xC701 - 0x3F = <0xC6C2>, and read for 0x9 bytes. If <C6C2>-<C63C>, for example, contains 67 6A 01 01 67 6A 01 01 01, then <C701>-<C709> will also contain 67 6A 01 01 67 6A 01 01 01.
   
    [5C39C] in ROM: C followed by CE F 12 -- int(0xCE/2) + 1 = 0x68 is rewind pointer. 0xCE is odd, so do not add anything to rewind pointer. Next, read F! Since F (a special case) was read, next, read 12, which means we're going to read 0x12+0x4 = 0x16 (22 decimal) bytes in RAM.
    If current RAM pointer (for example) is at <CD1E>, then read from CD1E - 68 = <0xCCB6>, and read for 22 (decimal) bytes. If <CCB6>-<CCCB>, for example, contains 55 57 57 55 57 57 57 55 57 57 57 55 57 57 55 57 57 57 55 57 57 57, then <CD1E>-<CD33> will also contain 55 57 57 55 57 57 57 55 57 57 57 55 57 57 55 57 57 57 55 57 57 57.)
       
 D: (ASM code at [3AC2])
    Get nibble = a
    This will be a constant nigh nibble
    Get two nibbles = bc
    for i = 0; i < bc + 0x14; i++:
    {
        Read nibble = d
        Output ad
    }
    (Example:
    [5C36F] in ROM: D followed by 5 2E 57 77 57 75 77 75 77 57 77 55 77 57 75 77 75 77 75 77 57 77 57 77 57 75 77 57 75 77 57 75 77 75 75
    Read 5, this is the constant upper nibble (e.g. everything will be 5x)
    Read 0x2E, add to 0x14, this is the number of nibbles to read (0x2E+0x14=0x42 bytes (66 decimal) will be output to RAM)
    Read 0x42 (66 decimal) nibbles
    Output to RAM, ORing with 0x50 (so for example, 55 57 57 57 ... 57 55 57 55))
 
 E: (ASM code at [3AE4])
    Get nibble = a
    This will be a constant high nibble
    if a = E:
       Stop generating the map and start the routine that loads sprites!
    else:
    {
        Get nibble = b
        for i = 0; i < b + 4; i++:
        {
            Read nibble = c
            Output ac
        }
    }
    (Example:
    [5C04C] in ROM: E followed by 60788A
   
    Read 6, bit shift by 4 (get 0x60) -- 6x is constant upper nibble
    Read 0, add by 4 to get 4 -- this determines how many bytes to output!!
    Read 7, output 0x67
    Read 8, output 0x68
    Read 8, output 0x68
    Read A, output 0x6A)
       
 F: (ASM code at [3B06])
    Get two nibbles = ab
    Get third nibble = c
    If c >= 8: (ASM code at [3B1E])
    {
        c = (c & 0x7) << 4
        Get fourth nibble = d
        Add e = c+d
        for i = 0; i < e + 11, i++:
            output ab
    }
    else if c < 8: (ASM code at [3B1A])
    {
        for i = 0; i < c + 3; i++:
            output ab
    }
   
    (When c < 8, output the same tile c+3 times. So it can output the same tile anywhere from 3 to 11 times.
    For example, a nibble string of F, followed by 3C3 would output 3C 3C 3C 3C 3C 3C to RAM.
   
    When c >= 8, read another nibble d. Output the same tile (c - 8) * 16 + d + 11 times. This can output anywhere from 11 to 138 times.
    For example, a string of F, followed by AB81 (@ [5C3E8] -- Ford Knocks) would output AB AB AB AB AB AB AB AB AB AB AB to RAM.)

When bytes are output to RAM, they occur just after the pointers at 0xC600.


But yeah, with the compression algorithm, making a tool to rip the maps will be hard, but probably not impossible. I'll see which method ends up better... And it may be possible to hack the ROM so that it has one byte per 32x32-pixel tile, but the thing is that there seem to be three levels (all of the same setting, like Stilt or Coral or whatever) in each ROM bank (one bank = 0x4000 bytes), and expanding the bytes may make that difficult. Of course, it may be possible to make each level in a different ROM bank and expand the ROM, but I currently don't know how to do that.

EDIT: I took a quick look at the DKL2 compression algorithm and compared the code to the DKL3 algorithm, and the code seems to be completely identical (except the addresses are in different spots). That's good.

EDIT 2: Looks like I made a mistake in the 0xBE case: It's supposed to output c+3 times, not c+2.
Treasure Hunter
Bananas received 108
Posts: 333
Joined: 2008

Re: About Donkey Kong Land 3 (and possibly more)

Postby Raccoon Sam » August 6th, 2012, 6:36 pm

I applaud you! That's fantastic!

Making a tool to rip maps from the ROM would be really hard like you said, but ripping from save states would be a million times easier.
What comes to level editing, I suggest that if one begins coding a comprehensive hacking tool, one should definitely go for a universal level extractor and a inserter-approach. This would be beneficial because uncompressed levels are easier to work with, and a universal tool would work on all games (although we're not sure about DKL1 or DKCGBC yet). When the levels are individual files, you could trade/show/read/send/edit them easily with a specially designed, separate level editor, or maybe even just a hex editor.

Fantastic job, Blaziken!
Trailblazer
Bananas received 35
Posts: 265
Joined: 2008


Return to ROM Hacking

Who is online

Users browsing this forum: No registered users and 1 guest