TransWikia.com

What are my options to add instructions to a binary?

Reverse Engineering Asked by jjmcc on February 10, 2021

I am fairly new to reversing so apologies in advance if any terminology is incorrect.

I am currently using ghidra on windows to look at the instructions/decompilation of a binary and I am looking to add some instructions to an existing function to change its behaviour. In this case, it is fairly trivial as I only want to add a fixed value to an existing function parameter, but I would also like some information on more advanced cases where the inserted code is slightly more complex.

I made a small test program to test this out and I managed to add the instruction by editing the binary in a hex editor and simply shifting the function bytes and inserting my own. However I realized that this is only possible because there was a bunch of empty memory following the function, so I could just shift them all down but this isn’t always the case.

                         **************************************************************
                         *                          FUNCTION                          *
                         **************************************************************
                         ulonglong __fastcall FUN_140011810(int param_1, int para
         ulonglong         RAX:8          <RETURN>
         int               ECX:4          param_1
         int               EDX:4          param_2
         undefined4        Stack[0x10]:4  local_res10                             XREF[3]:     140011810(W), 
                                                                                               140011835(R), 
                                                                                               140011848(R)  
         undefined4        Stack[0x8]:4   local_res8                              XREF[2]:     140011814(W), 
                                                                                               14001184e(R)  
         undefined1        Stack[-0x10]:1 local_10                                XREF[1]:     140011867(*)  
         undefined4        Stack[-0xf4]:4 local_f4                                XREF[4]:     140011858(W), 
                                                                                               14001185b(R), 
                                                                                               140011861(W), 
                                                                                               140011864(R)  
         undefined1        Stack[-0xf8]:1 local_f8                                XREF[1]:     140011821(*)  
                         FUN_140011810                                   XREF[1]:     thunk_FUN_140011810:14001104b(T), 
                                                                                      thunk_FUN_140011810:14001104b(j)  
   140011810 89 54 24 10     MOV        dword ptr [RSP + local_res10],param_2
   140011814 89 4c 24 08     MOV        dword ptr [RSP + local_res8],param_1
   140011818 55              PUSH       RBP
   140011819 57              PUSH       RDI
   14001181a 48 81 ec        SUB        RSP,0x108
             08 01 00 00
   140011821 48 8d 6c        LEA        RBP=>local_f8,[RSP + 0x20]
             24 20
   140011826 48 8b fc        MOV        RDI,RSP
   140011829 b9 42 00        MOV        param_1,0x42
             00 00
   14001182e b8 cc cc        MOV        EAX,0xcccccccc
             cc cc
   140011833 f3 ab           STOSD.REP  RDI
   140011835 8b 8c 24        MOV        param_1,dword ptr [RSP + local_res10]
             28 01 00 00
   14001183c 48 8d 0d        LEA        param_1,[DAT_140021008]                          = 01h
             c5 f7 00 00
   140011843 e8 44 f8        CALL       thunk_FUN_140011e80                              undefined thunk_FUN_140011e80(ch
             ff ff
   140011848 8b 85 08        MOV        EAX,dword ptr [RBP + local_res10]
             01 00 00
   14001184e 8b 8d 00        MOV        param_1,dword ptr [RBP + local_res8]
             01 00 00
   140011854 03 c8           ADD        param_1,EAX
   140011856 8b c1           MOV        EAX,param_1
   140011858 89 45 04        MOV        dword ptr [RBP + local_f4],EAX
   14001185b 8b 45 04        MOV        EAX,dword ptr [RBP + local_f4]
   14001185e 83 c0 0a        ADD        EAX,0xa
   140011861 89 45 04        MOV        dword ptr [RBP + local_f4],EAX
   140011864 8b 45 04        MOV        EAX,dword ptr [RBP + local_f4]
   140011867 48 8d a5        LEA        RSP=>local_10,[RBP + 0xe8]
             e8 00 00 00
   14001186e 5f              POP        RDI
   14001186f 5d              POP        RBP
   140011870 c3              RET
   140011871 cc              ??         CCh
   140011872 cc              ??         CCh
   140011873 cc              ??         CCh
   140011874 cc              ??         CCh
   140011875 cc              ??         CCh
   140011876 cc              ??         CCh
   140011877 cc              ??         CCh
   140011878 cc              ??         CCh
   140011879 cc              ??         CCh
   14001187a cc              ??         CCh
   14001187b cc              ??         CCh
   14001187c cc              ??         CCh
   14001187d cc              ??         CCh
   14001187e cc              ??         CCh
   14001187f cc              ??         CCh
   140011880 cc              ??         CCh
   140011881 cc              ??         CCh
   140011882 cc              ??         CCh

Specifically, 14001185e 83 c0 0a ADD EAX,0xa
I could duplicate this instruction and change 0xa to alter the output value.

In the more complex binary I have a larger function with similar parameters, except there is no additional memory at the end of the function so this approach to shift the remaining bytes wouldn’t work as there is another function directly below. I also can’t remove any of the current instructions to make space as that might break existing functionality. There is plenty of empty memory elsewhere in the binary so I thought of adding a jmp instruction to perform some instructions, and then jumping back but some of the instructions use local variables so I’m unsure if this will work.

So given the above example, and none of the extra memory at the end of the function, how can I insert some custom instructions?

3 Answers

I believe, you have to change that address to apropriate "JMP" command, and append ADD EAX,0xa at the end of the executable file, all other commands if overwritten and also the rest desired actions as well and after finishing, jmp back to the incremented address, you will need to correct executables header file as well for modified file length. Sure, when adding your own instructions, remember to correct all changed register pointers, e.g. stack...

Answered by Zurab on February 10, 2021

You are asking how to insert a "code cave" or a "balcony" into existing code. You could proceed like so:

  1. Select a location in the code from where you like to branch into your code cave. You should be prepared to replace existing code with five bytes, a JMP code and four offset bytes, and possible following surplus bytes with NOP statements to avoid garbage code.
  2. Carefully note the assembler mnemonics as well as the code bytes of the code to be replaced by your JMP.
  3. Calculate the offset and patch your existing code, e.g. in a hex editor of your choice. Keep in mind that you are in the 64-bit world where RIP-relative addressing is applied (your example shows it).
  4. At the location of your code cave, re-enter the instruction which you replaced by the JMP. If addresses are involved, re-calculate according to RIP-relative addressing.
  5. Save all registers/flags which you intend to modify, and which should have their original values back when leaving the code cave.
  6. Enter your new code bytes.
  7. Restore registers/flags as appropriate.
  8. JMP back to the next statement of your original code.

Let me make an example how to calculate the target address of the JMP statement:

Assume you wish to replace the

LEA param_1, [DAT_140021008]

statement with a JMP to 140018000 where you might have free space for the code cave, and the subsequent re-insertion of that LEA command at the new location.

Of course

  • that address must allow code to be executed (beware of DEP "Data Execute Prevention").
  • and you should be able to find that location in your hex editor. I do not know whether Ghidra allows patching directly in the code.

Calculate the address offset: Take your destination address and get the difference to the next instruction. Your replaced code will look similar to the following one (syntax possibly not correct):

14001183c E9 xx xx xx xx    JMP 140018000       ; the xx's to be calculated
140011841 90                NOP
140011842 90                NOP
140011843 e8 44 f8 ff ff    CALL thunk_FUN_140011e80 ;existing code

Offset: 140018000 - 140011841 = 67bf, the JMP line becoming
14001183c E9 bf 67 00 00    JMP 140018000       

At address 140018000 you might wish to re-insert the LEA statement:

140018000 48 8d 0d 79 9e ff ff      LEA param_1, [DAT_140011e80]
140018007 Your new code
...
JMP back to 140011843

The correct offset for the LEA call has been calculated:

140011e80 - 140018007 = ffff9e79 = -6187

Perhaps the mnemonic param_1 will be replaced by ECX, as that register is holding the param_1.

At the end of your code cave you have to calculate the JMP back to your original code in just the same way.

You might have noticed that due to the necessary target address re-calculations your simple "shift down" method also needs careful attention in the general case.

Remark: If you look in your example code at the statement at address 140011843

CALL       thunk_FUN_140011e80

you might note the "thunk_" prefix. It means that the immediate address is different from 140011e80. It is a "proxy", probably a JMP target inserted by the compiler leading to the address indicated in Ghidra's code. Ghidra calculates this for you.

The outlined method is to sketch the general construction of a code cave. Problems like local variables located on the stack must be considered (keeping the stack consistent), or items listed in the relocation table of the PE64 header. Care must be taken to handle those properly.

Answered by josh on February 10, 2021

What you are really looking for is a code stud. A code cave uses unused space to sort of jump add your own code and jump back. The problem with code caves is that there is a size constraint in the PE binary that won't give you a lot of space.

A code stud on the other hand is adding an addition .TEXT section. With this method you can avoid the pitfalls of DEP and have much more space. I have had easily up to 8 MB of space to work in.

All you will need to do is open the binary in the StudPE and add a section and make sure it is executable and then just jump to it ... do anything you want and jump back.

Also don't worry about inserting BYTES just use Ram Michael's multi assembler tool HERE in a debugger(olly or x64dbg) and just copy and paste the assembly code in. It needs to match MASM syntax, but converting is easy.

I have done whole projects like this so rest assured this is the easy way of doing things and save yourself hours of work.

You can use this tool here to create one Stud PE

Answered by LUser on February 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP