TransWikia.com

Review a collection of executable binaries to determine similarity

Reverse Engineering Asked by solumnant on August 23, 2020

I have a collection of different binaries I want to review for code similarity and classification, but I would rather not have to open each of them in IDA and manually review the code in 15 different samples.

What tools or techniques can I use to automatically classify and determine code similarity/reuse between samples? My first thought is to use ssdeep, but I was wondering if there were any other open source tools or frameworks that could do the same.

I would also like to request that a classification tag be added in case there are other people who want to ask questions about classifying files into different groups based on different binary features.

One Answer

I have done some academic research in the field of malware classification. It is not an easy task, and I don't know what similarity you are looking for, but I can list mostly used static features for binary classification, at least for malicious/unknown software.

  • Header information: dynamic libraries, sections
  • File size
  • Debug information
  • Executable code: disassembled instructions, opcodes, bytes (as arrays, probability vectors, and n-grams), code flow graph
  • Data and strings
  • Entropy can be used for different sections or a whole file
  • Mapping one-dimensional byte array to two-dimensional grayscale image and using image classification methods

We are not at a point where we can fully use these binary features and machine learning models in commercial products. It may help you if you can tolerate some false-positive classification. GitHub has lots of malware classification examples you can work on.

Answered by de6f on August 23, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP