Internet of Things (IoT) devices are already a significant part of our day-to-day life, work environments, hospitals, government facilities, and vehicle fleets. They are represented by Wi-Fi printers, smart door locks, alarm systems, and so on. In 2020, the average US resident had access to more than ten connected devices. But users who choose IoT devices for their usefulness also need to be sure these devices are secure.
Since IoT devices are usually connected to internal home or corporate networks, compromising such devices can provide criminals with access to the entire system. During the first six months of 2021, there were around 1.5 billion attacks on smart devices, with attackers looking to steal data, mine cryptocurrency, or build botnets.
One way to mitigate security risks of IoT is to perform reverse engineering activities to research the way particular devices are built and perform further analysis of a device and its firmware.
In this article, we show a practical example of reverse engineering firmware for a smart air purifier, highlighting the importance of researching its architecture. This article will be helpful for development teams working on cybersecurity projects who want to learn about the nuances and steps for reverse engineering IoT devices.
Contents:
The importance of researching the firmware architecture
The process of reverse engineering IoT firmware varies significantly depending on the device under research.
IoT devices evolve quite fast, and the dominating architecture in the market changes all the time. Less than ten years ago, the most popular choices were mainly x86 or ARM, and less likely MIPS or PowerPC. But now there are a great variety of microcontroller architectures you need to know to reverse engineer embedded devices: Tricore, rh850, i8051, PowerPC VLE, etc.
Going deep into learning a single architecture isnโt enough to succeed in IoT reverse engineering. And if itโs necessary for developers to start reverse engineering as fast as possible, they should start by learning the basics of the firmwareโs architecture and structure.
This is exactly what we want to describe in this article: the way reverse engineers can study new architectures and the format of firmware they have never seen before.
For this article, we used a firmware dump of the Xiaomi Air Purifier 3H. We chose it because itโs a firmware dump of the ESP32 CPU, which is the Tensilica Xtensa architecture. This is a pretty exotic choice of architecture, but itโs common in IoT devices that require Wi-Fi communication. You can find the firmware we will reverse engineer as an example for this article (ESP-32FW.bin) on this GitHub page.
The challenge for this case is that thereโs no existing decompiler for the firmware architecture and disassemblers barely support it. However, this is a pretty accurate example of what reverse engineers face nowadays.
The IoT firmware reverse engineering process consists of the following five stages:
1. Determine the architecture
The first question to ask before reverse engineering IoT devices is how one can know the architecture of the firmware they need to reverse engineer.
The most straightforward way to find out is to read the datasheet for the CPU and learn the answer from there. But there are situations when all you have is the firmware itself. In this case, you can use one of two options:
1. String search may allow you to find some leftover compilation strings that contain information about the compiler name and architecture.
2. Binary pattern search requires you to know instructions that are often used in different types of microcontroller architectures. You can search the firmware for binary patterns common for a specific architecture and then try to load the firmware into a disassembler that supports such an architecture to validate your guess.
Once you determine the architecture type, you can start choosing the toolset for further reversing. For ESP-32FW.bin, we already know that itโs going to be the Tensilica Xtensa architecture, so we need to select the disassembler weโre going to use for the research.
Plan a new IoT project or want to improve an existing one?
Deliver an advanced solution by leveraging Aprioritโs skills and experience in embedded software development and reverse engineering.
2. Choose the disassembler tool
After researching an appropriate disassembler that could support Xtensa, we ended up with three options: IDA, Ghidra, and Radare.
We decided to try using Ghidra and IDA first because we already have vast experience successfully applying these tools for different reverse engineering projects. And since IDA doesnโt have a decompiler for Xtensa, only a CPU module for the disassembler, we decided to first try working with Ghidra (we used version 10.0).
Ghidra doesnโt support Xtensa by default, so we needed to install the Tensilica Xtensa module for Ghidra first.
The disassembler for Xtensa works, but there are some issues with the decompiler, as you can see in the screenshot below:
After some time disassembling, we realized that Ghidraโs processor module for Xtensa had trouble determining the instruction length in multiple cases. Therefore, we dropped Ghidra and went to IDA (we used version 7.7).
It was challenging at first to find Xtensa in the list of processor modules, but finally we found it here:
The processor module in IDA appeared to be stable enough, so we decided to stick with IDA.
Read also
9 Best Reverse Engineering Tools for 2023 [Updated]
Discover the top programs for reversing and explore practical examples of using them. Leverage the power of reverse engineering to improve solutionโs security, maintain legacy code, and expand software compatibility.
3. Load the firmware
The first step is to load the firmware to the right image base address so that all of the pointers that are global variables are resolved to valid addresses. To do this, itโs necessary to learn where the code is in the binary.
We start by loading the firmware at the base address 0
and try to mark as much code as possible. To be able to properly mark the code in IDA, we need to learn the typical instruction sequences common to Xtensa firmware. To find out which instructions to use in the function prologs, we took a sample from GitHub: esp8266/Arduino: ESP8266 core for Arduino.
It appears the compiler uses the following instruction: entry a1, XX
This instruction translates into byte sequences such as 36 41 00 / 36 61 00 / 36 81 00 depending on the value of the XX
argument.
By implementing a simple IDA script to search for such a pattern, itโs possible to mark about 90% of the code:
Once weโve found the code, itโs time to explore and see whether it looks correct.
Looking at the screenshot below, itโs obvious that something is wrong. The string resources are referenced properly, but call8
instructions point to strings, not the code. And some of the call8
instructions point to non-existent addresses. Usually this means that the image base is wrong and the firmware must be loaded to some other base address, not 0
.
A common way to determine the base address is to:
- Pick a string.
- Use the low part of this stringโs address to find the code which references to it.
- Find the difference between the real string address and the address we see in the code. Thus, we can understand how to shift the address of the code to match the current address of the string.
In this case, we found that the base address must be 0x3F3F0000
, but even when using it the call8
instructions are still invalid. This could mean that the binary data is segmented and that the code from the flash memory is being mapped to RAM in pieces. Thus, it will be necessary to split the firmware into pieces and load these pieces into IDA in appropriate segments.
We looked at the strings in the firmware and discovered it was indeed segmented:
After additional research, we discovered the ESP IDF framework. Since our target firmware contains some version of this framework, we can try to use its source code to learn about the firmware structure.
We found an interesting bootloader_utility_load_partition_table() function in the bootloader_utility.c source code file within ESP IDF, which means the firmware must contain a partition table.
To identify the partition table, we continued exploring the source code and finally found the esp_partition_table_verify() function, which is called by the bootloader_utility_load_partition_table() function:
So there must be ESP_PARTITION_MAGIC
and ESP_PARTITION_MAGIC_MD5
:
#define ESP_PARTITION_MAGIC 0x50AA
#define ESP_PARTITION_MAGIC_MD5 0xEBEB
Binary search for AA 50
gave us good results:
Both ESP_PARTITION_MAGIC
and ESP_PARTITION_MAGIC_MD5
can be seen nearby. And most likely sub_3F3F4848 is esp_partition_table_verify().
Since we already know where the esp_partition_table_verify function is, we are able to find the bootloader_utility_load_partition_table function and the ESP_PARTITION_TABLE_OFFSET file offset:
ESP_PARTITION_TABLE_OFFSET is the file offset in the ESP32-FW.bin file. Now we just need to know the structure of the partition table entries. The source code of the ESP IDF framework helps us again:
typedef struct {
uint32_t offset;
uint32_t size;
} esp_partition_pos_t;
/* Structure which describes the layout of the partition table entry.
* See docs/partition_tables.rst for more information about individual fields.
*/
typedef struct {
uint16_t magic;
uint8_t type;
uint8_t subtype;
esp_partition_pos_t pos;
uint8_t label[16];
uint32_t flags;
} esp_partition_info_t;
Weโve imported these structures to IDA and applied them to the partition table data:
As you can see, esp_partition_pos_t.offset is the file offset for each partition, and we can now split ESP32-FW.bin into the partitions.
But how can we load each of the partitions to the appropriate address? It appears thereโs an image_load() function that is responsible for mapping the firmware partitions onto address space:
And each partition has a header:
typedef struct {
uint8_t magic; /*!< Magic word ESP_IMAGE_HEADER_MAGIC */
uint8_t segment_count; /*!< Count of memory segments */
uint8_t spi_mode; /*!< flash read mode (esp_image_spi_mode_t as uint8_t) */
uint8_t spi_speed: 4; /*!< flash frequency (esp_image_spi_freq_t as uint8_t) */
uint8_t spi_size: 4; /*!< flash chip size (esp_image_flash_size_t as uint8_t) */
uint32_t entry_addr; /*!< Entry address */
uint8_t wp_pin; /*!< WP pin when SPI pins set via efuse (read by ROM bootloader,
* the IDF bootloader uses software to configure the WP
* pin and sets this field to 0xEE=disabled) */
uint8_t spi_pin_drv[3]; /*!< Drive settings for the SPI flash pins (read by ROM bootloader) */
esp_chip_id_t chip_id; /*!< Chip identification number */
uint8_t min_chip_rev; /*!< Minimum chip revision supported by image */
uint8_t reserved[8]; /*!< Reserved bytes in additional header space, currently unused */
uint8_t hash_appended; /*!< If 1, a SHA256 digest "simple hash" (of the entire image) is appended after the checksum.
* Included in image length. This digest
* is separate to secure boot and only used for detecting corruption.
* For secure boot signed images, the signature
* is appended after this (and the simple hash is included in the signed data). */
} __attribute__((packed)) esp_image_header_t;
Next, each partition is split into segments. And after the header, you can see a structure that is followed by the actual data:
typedef struct {
uint32_t load_addr; /*!< Address of segment */
uint32_t data_len; /*!< Length of data */
} esp_image_segment_header_t;
Here, esp_image_segment_header_t.load_addr is the virtual address for the segment data in the CPU address space.
The segments within the partition look like this:
esp_image_header_t
esp_image_segment_header_t
<segment data>
esp_image_segment_header_t
<segment data>
...
Now, having full information about the segments, we can split the partitions into segments and load them to the appropriate addresses in IDA. We can do this extraction work manually or try to automate it via the IDA loader plugin.
Nevertheless, it appears that such a loader is already implemented for Ghidra.
Read also
How to Reverse Engineer an iOS App and macOS Software
Benefit from reversing activities to research complicated software issues and improve software protection. Learn how to decompile macOS software and iOS apps.
4. Study Xtensa architecture features
Now that we have all the segments loaded to the appropriate addresses, we can start the reverse engineering.
But to do it efficiently, we need to learn more about the Xtensa architecture, including:
- Argument order in instructions
- Execution specifics of conditional jumps
- Compiler calling convention
- Stack organization
The first thing to explore is the argument order in instructions. For example: MOV R1, R2
. You can find these kinds of instructions in all architectures, yet this may mean either copying R1 to R2 or copying R2 to R1. Thus, itโs crucial to know where the source code is and where the destination register is in the instructions. You can find the Xtensa instruction set description on GitHub.
As for the MOV
instruction, in Xtensa, it means that R2 is copied to R1. Thus, the first argument will be the destination in most simple instructions, such as math-related ones. For example, the instruction addi a14, a1, 0x38 would mean that a14 = a1 + 0x38.
But for instructions that store data, it will be the opposite. For example, the instruction s32i.n a5, a1, 0x10
means that the value of a5
must be stored at the address (a1 + 0x10)
.
The second thing to learn is the way conditional jumps are done. There are two ways to do it:
- Use a dedicated instruction for the comparison operation which sets the flags register and then the conditional jump.
- Use a single instruction that does all those actions at once.
Xtensa does the latter: beqz a10, loc_400E1C54
A single instruction is used to check if a10
equals zero, and then it either jumps to loc_400E1C54
or doesnโt.
The third step is to examine the calling convention used by the compiler: the way arguments are passed to the function and how the value is returned.
Xtensa passes arguments in quite an unusual way. Arguments are put into registers before the call instruction. But the registers in which they appear within the function are not the same as those they were in before the call:
Argument index | Register before the call | Register after the call |
0 | a10 | a2 |
1 | a11 | a3 |
2 | a12 | a4 |
โฆ | โฆ |
Hereโs an example of how to pass arguments to a function on the assembler level:
movi.n a12, 0x14
l32r a11, off_40080490
mov.n a10, a1
l32r a8, memcpy
callx8 a8
Here we have three arguments:
- a10 is a destination address
- a11 is a source address
- a12 is the size to copy
Yet as soon as the code enters the memcpy function, these values are automatically transferred into the a2
, a3
, and a4
registers respectively.
The same trick is used for returned values. Inside the memcpy function, the value is stored in the a2
register, yet after returning from the function, the value appears in a10
.
Hereโs what return 0
looks like:
mov.n a2, 0
retw.n
And this is what checking the returned value looks like:
call8 jsmi_parse_params
bnez.n a10, loc_400E1B15
benz.n checks the value of the a10
register upon returning from the call.
Finally, itโs necessary to learn how the stack is organized.
Xtensa uses the a1 register to create the stack frame. Each function starts with the entry instruction: entry a1,0xC0
, where 0xC0
is the size of the stack frame, i.e. the amount of stack the function requires for the stack variables.
And often, the functions start with initializing stack variables:
movi.n a5, 0
s32i.n a5, a1, 0x10
s32i.n a5, a1, 0x14
s32i.n a5, a1, 0x18
s32i.n a5, a1, 0x1C
s32i.n a5, a1, 0x20
s32i.n a5, a1, 0x24
s32i.n a5, a1, 0x28
s32i.n a5, a1, 0x2C
s32i.n a5, a1, 0x30
s32i.n a5, a1, 0x34
The zero value from the a5
register is being written in stack variables based on the a1
register.
After gaining all necessary knowledge about the Xtensa architecture, we can finally start reversing its code.
Related project
Developing Software for a Drone Battery Charging and Data Management Unit
Explore the success story of developing an MVP of the drone battery recharging kit: embedded software for the single-board computer, an iOS application, and cloud infrastructure to support the system.
5. Reverse engineer Xtensa code in IDA
Xtensa isnโt the most popular architecture and doesnโt have a full feature list in contrast to ARM, MIPS, and PowerPC. Therefore, there will be some limitations in the IDA processor module which we need to overcome.
The major limitations of the Xtensa processor module in IDA are:
- No automatic comments for function arguments
- Stack frame is not created automatically
- Some ESP32 functions belong to IROM, so there are calls to hardcoded addresses
- Some Xtensa instructions are not disassembled
Letโs discuss some tricks to overcome these challenges.
5.1. Type system and comments for function arguments
A type system for Xtensa is available starting from IDA 7.7. Having an available type system in IDA is very important, as it makes reversing convenient. In particular, it allows you to import the definitions of C structures and specify the function prototypes used by IDA to put automatic comments near the instructions that transfer the function arguments.
However, if you donโt have a type system, thereโs a workaround.
First, letโs look at what functions look like when thereโs a type system:
The function prototype is set with the names and types of the arguments so that IDA can use this information to comment the arguments at the call site:
But there will be no such thing for Xtensa. An alternative way is to use the repeatable comments feature in IDA. If you set a repeatable comment at the very beginning of the function, it will be shown at all of the call sites.
Thus, we can use this feature to define function arguments:
The call site will look like this:
You may select the register name in the comment and IDA will highlight it in the code. Thus, you can easily find an argument value.
Read also
How to Reverse Engineer Software (Windows) the Right Way
Find out how reversing can help you improve your software security and efficiency. Explore a step-by-step example of reverse engineering an application.
5.2. Recover the stack frame
To recover the stack frame, youโll need to manually specify the stack size and then show IDA where itโs used by pressing K at each instruction that works with the stack.
Letโs explore the config_router_safe function, for example:
Itโs obvious that the stack frame size here is 0xC0
. We use this value in the stack settings for the function (Alt+P)
:
Visually, nothing will happen, but if you go to the stack frame for the function by pressing Ctrl+K, youโll notice that stack space is now allocated:
The next thing to do is specify the stack shift using the entry
instruction. Before doing that, we suggest enabling the stack pointer visualization as shown in the screenshot below:
Now, the code should look like this:
000
is the current stack pointer shift value, and we need to shift it by 0xC0
. To do that, set the cursor at the entry instruction and press Alt+K to see the following window, where you can specify the desired difference between the old and new stack pointer:
As the result of this operation, the code will look like this:
Now, if you start pressing K at each instruction that works with the a1
register, IDA will create stack variables:
Itโs also possible to write an IDA script to automate these actions.
Read also
Discovering and Mitigating Security Vulnerabilities in Routers: A Practical Guide
Ensure reliable and secure work of routers to enhance your embedded software projects by learning the key attack vectors against routers and best practices to secure them.
5.3. Calls to IROM
Itโs not uncommon to see calls into some low-level API situated in the IROM part of the CPU and not in the firmware. In such a case, the firmware is just linked with a special linker definitions file containing defined IROM function addresses.
During reversing, IROM function calls look like this:
40058E4C
is the address within IROM. But itโs impossible to know which function is called from the firmware. So itโs necessary to inspect the ESP32 toolchain to find the linker definitions.
The IDE for the ESP32 chip is Espressif IDE. And searching for the IROM addresses within the IDE files brings us to: C:\Espressif\frameworks\esp-idf-v4.4.2\components\esptool_py\esptool\flasher_stub\ld\rom_32.ld
These values can be easily converted into the enum data type:
Then, we need to import into IDA so that enum can be applied to the IROM address values:
If we add the repeatable comment near the IROM address, itโll make everything much easier to read:
Read also
The Evolution of Reverse Engineering: From Manual Reconstruction to Automated Disassembling
Handle security tasks of any complexity efficiently and quickly by fully automating reverse engineering activities. Discover the key techniques, tools, and methods recommended by our cybersecurity researchers.
5.4. Unrecognized instructions
It often happens that the processor module has been implemented for some specific variant of the instruction set. And then manufacturers create new CPUs which have a 99% compatible instruction set of over ten new instructions that nobody expected to have initially. So tools like IDA, Ghidra, and Radare may not be able to disassemble some new instructions.
The proper way to overcome this challenge is to extend the processor module and add support for new instructions. This requires profound knowledge of disassembler APIs, which are not that easy to comprehend.
Letโs discuss a possible workaround for a case when you just want IDA to create the function despite the existence of some unrecognized instruction. Say IDA doesnโt know about the RER instruction and fails to create the function in case it contains RER opcodes:
You can press P as many times you like. Nothing will happen but errors appearing in the console window:
However, it doesnโt mean that IDA canโt create instructions which follow RER instructions. You can skip three bytes of the RER instruction and create the code afterwards:
Next, you can select the whole piece of code from entry till retw.n
and press P:
After that, IDA will create the function:
Usually, extended instructions that were not recognized by the disassembler don’t make too much difference during reversing. What can cause problems are new instructions that perform actions like a call, a jump, or a load/store, as the code flow is lost and the references to data are missing.
Read also
How to Reverse Engineer a Proprietary File Format: A Brief Guide with Practical Examples
Make sure your team knows how to improve your softwareโs compatibility by finding the way to help it process closed file formats.
Conclusion
Researching unknown hardware architectures before moving to business logic is essential for projects that involve reverse engineering IoT firmware. Even though it can take reverse engineers a few weeks to learn the architecture, such profound research helps to improve the speed of further work in the long run.
At Apriorit, we have a professional reverse engineering team with rich experience using various reverse engineering tools and techniques. Having expertise in various fields including cybersecurity, cryptography, and embedded software, we can help your business with a reverse engineering project of any complexity.
Want to enhance your IoT projectโs security?
Hire professional reverse engineers and IoT developers from Apriorit to deliver a reliable and protected solution.