TransWikia.com

Reconstructing Assembly Truth Map

Reverse Engineering Asked by fxorf on January 16, 2021

So I recently read [1] which evaluates several disassemblers. The truth/test binaries are generated by the SPEC CPU 2006 Benchmark suite. The authors provide detailed build instructions for the ELF set inside a VirtualBox. However, reconstructing the Windows binaries seems impossible for me, as the authors didn’t provide the SPEC configuration files for building those binaries. So building the binaries with VisualStudio2015, which actually match with the provide mapping files [2] didn’t work (bruteforcing/guessing the compilation settings).

For example most of the samples “SPEC/vs15-32/C++” and “SPEC/vs15-32/C” have the following prologue of the .text base (see appendix for properties description):

[2]
@0x0000000000401000:  [FBIC]
@0x0000000000401001:  [IC]C
@0x0000000000401003:  [IC]CC
@0x0000000000401006:  [IC]
@0x0000000000401007:  [IC]
@0x0000000000401008:  [IC]CC
@0x000000000040100b:  [IC]C
@0x000000000040100d:  [JIC]C

All of the 64 bit samples have following prologue:

<Segment .text, vaddr 0x0000000000001000, size 1600112, flag [RX]>
@0x0000000140001000:  [FBIC]CCCC
@0x0000000140001005:  [IC]CCCC
@0x000000014000100a:  [IC]
@0x000000014000100b:  [IC]CCC
@0x000000014000100f:  [IC]CC
@0x0000000140001012:  [IC]CC
@0x0000000140001015:  [IC]CC
@0x0000000140001018:  [JIC]C

I tried several configurations for the SPEC build instructions with VS2015 Compiler and intel compiler. However, inspecting all of the resulting binaries don’t match with the provided ground truth of [1].

So two concrete questions:

  1. When all samples share similar .text prologues, they all share
    similar libraries?

  2. Could somone infer the compilation details by the repeating
    prologues?


Properties descriptions:

  d - data
  c - code
  i - instruction boundary
    Note that if a byte is an instruction boundary (start of an instruction),
    this implies that it is a code byte
  o - instruction boundary (start of overlapping instruction)
  b - basic block start
    Basic block boundaries are not always explicitly listed, as they can usually
    be found by parsing the instruction/function listing into a control-flow graph
  f - function start
  e - program/binary entry point
  r - function end (return, tail call, etc.)
  j - control-flow instruction (jmp, call, ret, ...)
  x - crossref/call instruction
  n - NOP or other function padding

[1] An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries

Dennis Andriesse

Zeus is a family of credential-stealing trojans which originally
appeared in 2007. The first two variants of Zeus are based on
centralized command servers. These command servers are now routinely
tracked and blocked by the security community. In an apparent effort
to withstand these routine countermeasures, the second version of Zeus
was forked into a peer-to-peer variant in September 2011. Compared to
earlier versions of Zeus, this peer-to-peer variant is fundamentally
more difficult to disable. Through a detailed analysis of this new
Zeus variant, we demonstrate the high resilience of state of the art
peer-to-peer botnets in general, and of peer-to-peer Zeus in
particular.

One Answer

We implemented a generator for creating such ground truth mappings and recently published it.

You can find further details within the github repo:

https://github.com/LL-MM/approxis-groundtruth

Answered by kn000x on January 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP