Decompiling/Control Flow

From OpenDUNE

Jump to: navigation, search

Contents

Intro

A normal assembly application, even a 32-bit or 64-bit one, is 'stupid'. There is no real control flow. A CPU executes an instruction at cs:ip and that is it. If it contains a normal instruction, the ip is put after the instruction, and the next cycle it reads that instruction. If it contains a control instruction, things differ.

Control Instructions

There are 4 basic types of control instructions:

* Jump
* Call
* Interrupt
* Return

Each one translates to different code. If the decompiler has seen a jump before, it replaces most of the complexity with a function call. In cases it hasn't seen the jump before, it looks a bit ugly. Below both cases are handled for each type.

Jumps

Jumps are the easy ones. They just change the ip, and sometimes the cs. And that is all. They don't alter any state. There are several types of jumps, which you can put in categories:

* Jump Short
* Jump Near
* Jump Far

And:

* Jump Immediate Value
* Jump Memory/Register

Jump Short / Near (with Immediate Value)

It jumps close to the current instruction, either before or after. They are the easy ones. An unresolved version looks like this:

emu_ip = 0x027A; emu_last_cs = 0x01F7; emu_last_ip = 0x007E; emu_last_length = 0x0059; emu_last_crc = 0xCF03; emu_call();

Simply said: it sets where it wants to jump to (to 0x027A) inside this cs. The emu_last values tells the JIT where it would have jumped from. Then it tries to jump. Because it is unresolved, this will always fail in the static version. You can of course always try to resolve it yourself. In Dune2 code is never rewritten on the fly, so if you have a function at 0x027A, you can just make it call there. Also, it might be a jump inside a block you know the data of, you can safely use it in such cases.

The resolved form is also simple:

if (!emu_flags.zf) { f__01F7_0037_004A_77E8(); return; }

It is always in a structure like this.

Jump Far (with Immediate Value)

The same as above, just it also changed the cs. So before those lines you see 'emu_cs = 0xvalue'. In such cases it jumps to another segment. In the resolved case it looks like:

emu_cs = 0x01F7; f__01F7_0037_004A_77E8(); return;

For example. It needs to set the cs before making the jump. cs is always important, ip rarely is.

Jump Memory/Register

As last, a jump can also be based on a memory or register entry. In those cases there will be a big switch. If there are 'emu_push' above the switch, it is a call, otherwise it is a jump. The rest is the same, only the end-point can be variable.

Calls

Calls are very much the same as Jumps. They only don't have a Short version. The decompiler btw doesn't really differ between Near and Short, as it is more a low-level thing. Calls do have one essential difference: they push the (cs:)ip pair of the return point on the stack (see Decompiling/Stack).

emu_push(0x0352); f__01F7_01C0_000A_2FC7();

This for example is a call to a function in the same cs.

emu_push(emu_cs); emu_push(0x0352); f__01F7_01C0_000A_2FC7();

This would have been a call outside the cs we are in.

Interrupts

The same as Calls, but they first push the flags on the stack.

Returns

Returns 'undo' a call. They pop the ip from the stack. If it is a far return, it pops the cs. If it is an interrupt return, it pops the flags.

/* Return from this function */
emu_pop(&emu_ip);
emu_pop(&emu_cs);
return;

It always has a comment if generated by the decompiler, to tell you what happened.

Control Flow

Normally, when there is a call, you should make the function called a real C function. Either they are used by other code, or they really are something that should stand on its own. As Dune2 is compiled from C, you can be sure that the original programmers had it written down as a function. So it is wise to write them as function too. Interrupts are rarely used, and returns are to return from a function.

Jumps are another story. The Jumps at the bottom of a function mostly means it continues on the next function. It is in general perfectly safe to merge those 2 functions together.

  emu_push(0x0FCC); f__01F7_06C0_000B_E2C7();
  f__01F7_0FCC_001C_02CF();
}

void f__01F7_0FCC_001C_02CF() {

Take this example. Here the second function can just be merged into the first. In every function there are @implements headers. Don't forget to merge those too. Every entry with a () after it, can be called. The ones who don't, can't. So in this case you will have to remove the () from the 0FCC function, as it is no longer directly callable. If anything does try to call it, the static analyser will tell you. This to avoid common mistakes.

Structures like:

if (!emu_flags.zf) { f__01F7_0FD7_0011_70B5(); return; }

Are getting more complex. In general it is in this case also fine to include the content of the 0FD7 function inside that if(). As long as they are jumps, it is mostly possible. But you have to be careful here. You will notice that at a given point both flows return at the same point. So you have to work a bit to make that look good. Also, it can include code that is already above this if statement, which indicates a loop. There is not much to tell about this. You will have to work with it a bit to detect such things, and to be able to handle them nicely. When in doubt, just leave the function as a separated function. But, again, as this is compiled from C, all normal jumps should be resolvable to one function. Just follow the rabbit ;)

For now it is better to leave the pushes and pops of function calls/returns. When you are very sure nothing else calls the function, you can mostly remove both (the pushes at the callers side, the pops in the function itself).

Don't forget to name your functions clearly, or leave them unchanged if you don't have a clue. The @name has to match the name of the function. Add comments on what you think the function does. Don't overdo it. Be clear.


Extra information

Reference information for instructions and their purpose(s): [1]

Personal tools