TransWikia.com

Confusion between different ARM architectures

Reverse Engineering Asked by hEShaN on August 21, 2020

I am trying to develop a machine learning model to help with the process of reverse engineering. As the first stage, we are currently trying to train an embedding model that understand the dependencies and relationships between the insrtuctions. As usual, embedding model does need huge amount of data to be smart.

Our work is more focused upon microcontroller based systems and the architectures of ARMv7-m and ARMv6-m are the most important for our work. However, as finding binaries related to these architectures that could be useful for training the embedding model, I was thinking to use ARMv7 from a debian packages so that I could use already compiled binaries and train the embedding model.

However, these binaries are compiled with ARMv7-a architecture and as I know ARMv7-a does not include the thumb instruction set (correct me if I am wrong). Could somebody explained me whether ARMv7-a does include thumb instructions in its ISA? Will training on ARMv7-a would help to understand relationships between ARMv7-m and ARMv6-m instructions? Is it going to be an issue if I use ARMv7-a as my goal is to work with the ARMv7-m systems?

I really had trouble understanding these and any help or thoughts are much appreciated.

One Answer

While ARMv7-A does include the Thumb-2 subset used in ARMv7-M, the actually used instructions in ARMv7-A binaries will likely be pretty different from those used in ARMv7-M microcontrollers.

For one example, microcontrollers rarely use NEON floating-point or vector instructions from ARMv7-A (Cortex-M4F has FPU but IIRC it’s single precision only). Conversely, there are some instructions which are only used in Cortex-M but not Cortex-A (e.g. instructions accessing some system registers) and of course the A variants support the “classic” ARM instructions not supported at all by the M subset. ARMv6-M is even more different since it mostly uses the 16-bit (Thumb-1) subset and not the more powerful Thumb-2.

One more difference: Debian packages mostly contain user-mode code running in an OS environment but microcontrollers usually run monolithic firmware running on “bare metal”, implementing either a simple state machine or some kind of RTOS with the OS functionality, interrupt handlers and the “user payload” tasks all combined in the same binary.

Summary: while there may be some intersection between ARMv7-A and ARMv7-M, they’re used in quite different environments and training on armv7 Debian packages is unlikely to give good results on ARMv7-M firmware.

P.S. the compilers are usually different as well. Linux software is usually compiled with GCC (sometimes Clang) but the consumer devices firmware is often compiled with commercial compilers such as ARM’s own Keil, IAR, or GHS (Green Hills Software).

Answered by Igor Skochinsky on August 21, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP