Microcontrollers have limited protection against cybersecurity threats and attacks. As a result, the security of Internet of Things (IoT) devices and embedded systems that rely on them can be compromised.
To improve the security of IoT devices or their firmware, you need to know exactly what microcontrollers are used. Having this knowledge opens new possibilities for deeper software analysis and, therefore, more efficient improvement of your solutionโs performance and security. In this article, we show a way to identify a microcontroller model using firmware analysis.
This article will be useful for embedded software developers and reverse engineering specialists looking for efficient ways to automate the process of microcontroller identification.
Contents:
- What a microcontroller is and how to identify one
- Identifying the MCU model based on hardware addresses
- 1. Analyze the microcontroller source code
- 2. Parse and sort information from headers
- 3. Dump the binary to C-style pseudo code
- 4. Search for the peripheral addresses in the pseudo code
- 5. Identify the microcontroller model
- Conclusion
What a microcontroller is and how to identify one
These days, people daily use various embedded systems and IoT devices that are filled with all sorts of microcontrollers. Microcontrollers, or microcontroller units (MCUs), are small integrated circuits designed to execute a specific operation in an embedded system.
Engineers often use MCUs in medical IoT devices, automotive systems, and even the space industry. Whether a device is measuring your heart rate, detecting smoke in the air, or managing the energy consumption of your smart car, there will be a whole set of microcontrollers involved in the process.
A basic microcontroller usually consists of three core components:
- Processor
- Memory
- Input/output (I/O) peripherals
Microcontrollers often have limited memory and bandwidth resources and are dedicated to a specific task. Most microcontrollers are custom-made and donโt have a frontend operating system.
Having limited resources, microcontrollers can become an easy target for cybercriminals. Thatโs why, to ensure an IoT deviceโs proper protection, itโs vital to address specific vulnerabilities and weaknesses of the microcontrollers the device relies on.
When you have a custom device you’ve built from scratch, youโll probably know all the ins and outs of it. However, if your project is poorly documented or you have to work with an IoT device you know nothing about, the first task is to identify the models of microcontrollers it uses.
How to identify a microcontroller model?
Trying to distinguish different binaries from one another can be challenging, especially if the binaries use a lot of similar code or perform a similar function. And when it comes to identifying binaries related to microcontrollers, you can easily run into functions performing similar tasks, which can make things even more obscure than in an executable binary.
However, thereโs an efficient way to not only identify the model of a microcontroller but to automate the process.
First, letโs take a look at the key steps you need to take in order to identify the model of an MCU:
To automate this process and be able to quickly and easily identify microcontroller models, you need to:
- Automate the generation of the C-style pseudo code. You can do this using tools like IDA-Pro or Ghidra.
- Gather all of the headers for microcontrollers you want to search from your microcontrollerโs code or database.
Now letโs see how to identify unknown microcontrollers using this approach.
Need help with a non-trivial IT project?
Let Aprioritโs embedded software developers and reverse engineering specialists help you overcome all technical challenges for your product.
Identifying the MCU model based on hardware addresses
In this section, we discuss how you can identify the model of a microcontroller by analyzing the hardware addresses of its peripheral devices.
Since peripheral devices use hardware access, you should be able to see access to static hardware addresses in the binary code of the microcontroller, and these addresses should be defined in a C header. Also, because these addresses are hardcoded, they shouldnโt be heavily impacted by the address space layout randomization (ASLR) in the binary.
To follow our guide, youโll need some knowledge of C, Python, and the following tools:
Weโll be working with the S32 PPC microcontrollers from NXP, specifically the MPC5 timer demo. However, keep in mind that the approach described below can be applied to any MCUs you come across. The only time this approach may turn out to be inefficient is if thereโs a microcontroller very similar to the model of your MCU; for instance, if two or more MCU models were made by the same company and have the same peripheral devices.
Now letโs get started.
1. Analyze the microcontroller source code
Take the MPC5 microcontroller demo and look at its source code in IDA-Pro. You should see some references to peripheral devices in a struct-like pattern. When you dive deeper, you will see references to the MPC5744P.h header with such definitions:
#define SRAM0_START 0x40000000UL;
#define ADC_0 (*(volatile struct ADC_tag *) 0xFBE00000UL)
#define ADC_1 (*(volatile struct ADC_tag *) 0xFFE04000UL)
#define ADC_2 (*(volatile struct ADC_tag *) 0xFBE08000UL)
#define ADC_3 (*(volatile struct ADC_tag *) 0xFFE0C000UL)
These are the hardcoded references to the microcontrollerโs peripheral devices.
Did you notice that they are structs? This means they will look slightly different in memory due to padding and alignment.
These struct addresses will be the key source of information for you, so you need to get all of them without spending too much time and effort by doing some parsing.
Read also
How to Reverse Engineer a Proprietary File Format: A Brief Guide with Practical Examples
Discover how to improve your softwareโs compatibility by finding a way for processing closed file formats.
2. Parse and sort information from headers
As finding and copying hardware addresses manually is a tedious and time-consuming task, you can try automating this process.
Since struct addresses are definitions, you can use GCC to parse them with the following command:
gcc -I -E -dM /path/to/file
where
/path/to/file
contains the path to your microcontrollerโs header files.
Executing this command allows for dumping all the #definitions preprocessor values. You can then filter the received information by processing your target structures with the grep utility.
Once you have the output from GCC, you can use regular expressions (regex) to filter the definition names and addresses. In our example, weโll be working with regex in Python, using the RE Python module.
We chose to work with Python tools because Python is easy to read and intuitive to learn. Thereโs also no need to learn memory management, in contrast to some other languages, so we can accomplish our task simply and conveniently.
Hereโs how you can filter data with regex in Python:
def strip_defs_addrs(self, definitions_list):
jlist = []
for definition in definitions_list:
variable_name = re.findall("(?<=#define )\w+", definition)
address = re.findall("0[xX][0-9a-fA-F]+(?:[-'!` ]?[0-9a-fA-F]+)", definition)
if not address:
continue
jlist.append({'definition': variable_name[0], 'address': address[0]})
return jlist
Note: If you run into some scenarios where you need to evaluate definitions (
#define SRAM0_START BASE_ADDR + SOMEADDR;
), you will need a preprocessor. A preprocessor will get the #definitions values and parse out these values pre-compilation so you donโt have to search by hand for them through multiple headers.
Hereโs how you can do it using Pythonโs py-c-preprocessor:
from preprocessor import Preprocessor
p = Preprocessor()
p.include('/path/to/your/header.h')
print(p.expand('SRAM0_START'))
Now, letโs go back to our analysis.
3. Dump the binary to C-style pseudo code
If you have your hardware addresses ordered properly, you should see where the address of each new peripheral device begins and thus be able to find the peripheral struct in the binary.
However, trying to parse the binary in a hex editor might be difficult. Instead, you can open the binary and analyze it in IDA Pro.
Once you open the binary in IDA Pro, you can generate a file with C-style pseudo code by clicking file โ produce file โ C file. The file with the C-style pseudo code will hold the addresses youโll need for microcontroller model identification.
Related project
Developing and Supporting a CRM System for a Medical Transportation Company
Explore the success story of improving data security and ensuring compliance with industry requirements for our clientโs CRM system.
4. Search for the peripheral addresses in the pseudo code
Once you have the C file, you can begin searching for the peripheral device addresses. If you look at the file with the C-style pseudo code, youโll see multiple lines with
MEMORY[ 0xDEADBEEF] = 0
addresses. Note that these addresses are slightly off in the pseudo code because of the size of the information being stored.
You should be able to simply load the header file, parse and sort the definitions, and then search for them in the file with C-style pseudo code. As in the previous step, you can use regex to parse the C file and use the parse_memory_locations_from_C_file function to filter out the addresses you need:
def parse_memory_locations_from_C_file(self, c_file):
regex_addr_definition = "MEMORY\[(.*)\] = \d"
findings = []
with open(c_file, "r") as code_file:
lines = code_file.readlines()
for line in lines:
address = re.findall(regex_addr_definition, line)
if not address:
continue
findings.append(address[0])
return findings
Then you need to pass the sorted addresses you pulled from the header file and the addresses found in the C-style pseudo code file to the perform_search function.
def perform_search(self, sorted_header_definitions, found_psuedo_code_addrs):
finding_count = 0
for idx, definition in enumerate(sorted_header_definitions):
for finding in found_psuedo_code_addrs:
try:
int_mem_def = int(sorted_header_definitions[idx]['address'], 16)
next_int_mem_def = int(sorted_header_definitions[idx + 1]['address'], 16)
int_finding = int(finding, 16)
if int_finding > int_mem_def:
if int_finding < next_int_mem_def:
print("Finding in pseudo code: ", finding, "perf",
sorted_header_definitions[idx]['definition'], " between ",
sorted_header_definitions[idx]['address'], " and ",
sorted_header_definitions[idx + 1]['address'])
self.total_finding_score += 1
finding_count += 1
break
except IndexError as error:
pass
return finding_count
Note: Technically, if you know the size of the struct, you can automatically find all structs with the same size at the exact location. However, we wonโt describe this process, as it goes beyond the scope of the current guide.
Read also
The Evolution of Reverse Engineering: From Manual Reconstruction to Automated Disassembling
Handle security tasks of any complexity efficiently and quickly by fully automating reverse engineering activities. Discover the key techniques, tools, and methods recommended by our cybersecurity researchers.
5. Identify the microcontroller model
Once youโve searched for peripheral addresses in the file with the C-style pseudo code, you can try to identify your microcontroller. To do so, compare the hardware addresses in the header of your binary file to the addresses found in the C file. By calculating the weighted score for the different microcontrollers, youโll be able to identify the one youโre dealing with.
Here are the results from our example:
Matches found for MPC5744P.h
Finding in pseudo code: 0xFC040010 perf INTC_0 between 0xFC040000 and 0xFC050000
Finding in pseudo code: 0xFC050010 perf SWT_0 between 0xFC050000 and 0xFC068000
Finding in pseudo code: 0xFC068018 perf STM_0 between 0xFC068000 and 0xFC07C000
Finding in pseudo code: 0xFFFB0108 perf PLLDIG between 0xFFFB0100 and 0xFFFB8000
Finding in pseudo code: 0xFFFB8030 perf MC_ME between 0xFFFB8000 and 0xFFFC0000
Finding in pseudo code: 0xFFFC0240 perf SIUL2 between 0xFFFC0000 and 0xFFFD0000
Completed
Weighted score: MPC5646C.h 24%
Weighted score: MPC5601D.h 18%
Weighted score: MPC5744P.h 35% <--------- This is our winner.
Weighted score: MPC5777C.h 24%
The microcontroller with the highest weighted score is the one youโre looking for. In our case, itโs the MPC5744P.h microcontroller.
To see the full code for the script used in our guide, go to the Apriorit GitHub page.
Conclusion
Microcontrollers are a vital part of todayโs IoT devices. Knowing which microcontrollers a particular device contains is necessary to effectively improve the deviceโs performance and security.
With reverse engineering tools and techniques, developers can automate the process of microcontroller identification using firmware analysis, saving their time and effort for more complex and important tasks.
At Apriorit, we have experts who know how to reverse engineer software and can help you with a project of any complexity. With our experts in embedded and IoT development and reverse engineering, you can ensure the top-notch performance and security of your companyโs embedded solutions.
Want to achieve top-notch security for your IoT software?
Improve the performance and security of your embedded solutions โ get in touch with Aprioritโs experts in embedded and IoT development and reverse engineering.