内容简介:As many of you may or may not be aware of, I have a serious obsession with embedded systems security. It wasn’t until about two years ago where I started my journey of incorporating my knowledge in reverse engineering software applications into pulling apa
As many of you may or may not be aware of, I have a serious obsession with embedded systems security. It wasn’t until about two years ago where I started my journey of incorporating my knowledge in reverse engineering software applications into pulling apart firmware from embedded devices. Additionally, I also started learning hardware security concepts such as side-channel attacks, fault-injections/glitch attacks, bit flipping, and more.
Anyone who has began their careers in embedded security will tell you that the Arduino board is the best device to get started on. They have a pretty intricate IDE, and an abundant amount of HAL (Hardware Abstraction Layer) API calls to use. Needless to say, writing your first hello world
application in the embedded systems industry is fairly rudimentary, which is just blinking an LED.
The goal of this paper to show you how to use the Z3 formula (with python) to break weak password checkers that use linear inequalities or equalities such as satisfiability modulo theories .
AVR Architecture
To give you some background on what Arduino uses as their main MCU (Microcontroller Unit), it will help to understand what AVR is. AVR is an 8-bit RISC architecture (Reduced Instruction Set Computing). AVR based microcontrollers became widely popular for being one of the first in the game to have on-chip flash storage. For the lab exercise, we will be using an Arduino Nano board with an ATmega328P MCU.
What we have here is a 28-pin chip with an operating voltage that ranges from +1.8V to +5.5V. Only 23 of the 28 pins are programmable as GPIOs (General Purpose Input/Output). It does come with your standard set of communication interfaces such as Master/Slave SPI (Serial Peripheral Interface), USB, USART (Universal Synchronous/Asynchronous Receiver Transmitter), and Two-Wire peripheral protocols such as I2C (Inter-Integrated Circuit). We will discuss USB-to-USART more in depth later on as this will become important when flashing our chip with the target firmware and communicating with our chip by sending and receiving data.
Let’s go over this diagram above;
- The data bus handles data in 8-bit chunks that travel inside the microcontroller, as the bus line is 8-bits wide.
- General Purpose Registers are used for arithmetic operations such adding and subtracting numbers, setting jump pointers, and compare operations. From the previous article I wrote, we discussed registers for x86 architecture, and in this case of AVR we have R0…R31.
- ALU stands for Arithmetic Logic Unit. This works in direct connection with the general purpose registers. Arithmetic operations between a GPR and an immediate operand can be executed within a single clock cycle. We can divide these operations into three categories: arithmetic, logical, and bit-wise functions.
- Instruction Decoder is a combinational circuit whose purpose is to translate an instruction code into the address at micro-memory.
- Program Counter (PC) is 14 bits wide and addresses the program memory locations.
- As we can see from the picture above, there are three different types of memory being used here. 1) Flash Memory , which is programmable read-only memory (ROM), and firmware can only be changed using a programmer or bootloader. 2) SRAM , an area for volatile memory that holds data only when electrically powered. 3) Lastly, EEPROM (Electronically Erasable Programmable Read Only Memory), a semi permanent data storage that can only be accessed using special registers inside the AVR. It has the ability to control things like addresses to be written, data to be written, and flags used to read data.
- I/O Lines are used for controlling applications through the digital input and output pins. These pins can detect any voltage changes such as HIGH or LOW electromagnetic force digital output. All I/O pins are connected to diodes that trace to a VCC and GND line.
Now that we have a better understanding of AVR architecture, let’s briefly go over how we are going to communicate with our ATmega328P chip using USB-to-USART.
USART
The USB-to-USART converter/bridge is a serial port to your computer and is responsible for sending serial data over two wires. There is no clock signal and no parleying between the two devices. To communicate accurately, both devices must be configured beforehand to use the same speed of communication, otherwise known as the baud rate. The baud rate is simply the rate at which information is transferred in a communication channel. In the serial port context, 9600 baud
means that the serial port is capable of transferring at a maximum of 9600 bits per second. It can be calculated in two ways:
= ⋅( /( + )) = ⋅(( + )/ )
Where B
is the defined as units of total bits per second, D
is the application of data bits per second, and N
is the bits that are conveyed per symbol, and the gross bit rate is R
, inclusive of channel coding overhead, the symbol rate fs
can be calculated as:
f⋅(s) = R / N
The USART clock must operate at 16 times the desired baud rate. The clock is based around the operation of a crystal oscillator which, in the case of the ATmega328P USART, is set to a constant 3.6881 MHz.
Lab Assignment
There are two files we are going to need for this exercise. Both of these files can be found at VirusTotal .
7afa940272694061bde3d1eea7f4827a 4f055c15d3841872ba8156ffb968a8ab
The next step to getting our board ready will be to upload the hex file we just downloaded by using avrdude
. Connect the board to your computer via USB, and be sure to have installed avr-tools
. The typical baud rate upload speed is going to be 57,600. We are going to run this at the command line to get the firmware on the board:
➜ jumpy git:(master) avrdude -c arduino -p atmega328p -P /dev/cu.wchusbserial1410 -b57600 -u -V -U flash:w:jumpy.hex
At this point we need to figure out the what is the baud rate the USART is using to communicate all of its serial data. After running baudrate.py , you should get the result of 19,200. Now we can use the application screen
to communicate with the board from our machine.
screen /dev/cu.wchusbserial1410 19200
Looks like a pretty straightforward single-password check. Let’s move on over to the jumpy.bin
file and start our reverse engineering process. We are going to use Ghidra for this project as I found it to be a lot cleaner when using the AVR disassembler. After we load the file into a project directory, specifying the language is a must; so in our case, AVR8 Little Endian 16-bit assembly was the best option.
Ghidra will automatically analyze all the code including the vector table at the start of the disassembly output. One thing to note: you will see a lot of opcodes that have not been analyzed in code section, and in that case we will highlight all those bytes and press D
, this will automatically disassemble the opcodes and you can use the F
key afterwards to create the newly examined code into a function.
After Ghidra has completed analyzing the binary file, the output can look a bit convoluted. The first set of subroutines are responsible for initializing the peripherals such as the USART/TX-RX, SPI, TIMER, PCINT, Interrupts, and Reset table.
Side note:If you ever want to find the main()
function in firmware code, trace to the base address to RESET -> code:000000
of the bootloader. It will always contain a simple jump (I.E. jmp lab_000045
) to an address outside of the bootloader that will directly lead to a start routine that holds main()
after performing a lot of operations to setup the environment.
We will most likely have better luck doing dynamic analysis, which won’t be easy, but will definitely give a lot more satisfying results. The strings table doesn’t show much and there are no XREF’s to any of the offsets in code segment.
Emulator
The goal right now is to emulate the code and figure out where the firmware is asking for input from the user in the USART. We have to find an offset in the beginning to set a breakpoint at so we can commence step-tracing. The first offset ( 0xaa
) I noticed that shows logical code is at function 0x003cc
.
FUN_code_0003cc:0003d2(c) code:0000aa cf 93 push Ylo code:0000ab df 93 push Yhi code:0000ac cd b7 in Ylo,SPL code:0000ad de b7 in Yhi,SPH code:0000ae 60 97 sbiw Y,0x10
Before we start up our debugger, there are three tools you are going to need to download:
- SimAvr => https://github.com/buserror/simavr
- Picocom => https://github.com/npat-efault/picocom
- Avr-Gdb => https://github.com/igormiktor/build-avr-gcc
Open three terminal windows and let’s start with the first one. Run this simavr-simduino
command:
$ board_simduino/obj-x86_64-linux-gnu/simduino.elf -d jumpy.hex atmega328p booloader 0x00000: 3402 bytes avr_special_init avr_gdb_init listening on port 1234 uart_pty_init bridge on port *** /dev/pts/1 *** uart_pty_connect: /tmp/simavr-uart0 now points to /dev/pts/1 note: export SIMAVR_UART_XTERM=1 and install picocom to get a terminal
Next is picocom, which will be used as a bridge for the USART communication. In the next terminal window, run this command:
$ picocom /tmp/simavr-uart0 picocom v2.2port is : /tmp/simavr-uart0 flowcontrol : none baudrate is : 9600 parity is : none databits are : 8 stopbits are : 1 escape is : C-a local echo is : no noinit is : no noreset is : no nolock is : no send_cmd is : sz -vv receive_cmd is : rz -vv -E imap is : omap is : emap is : crcrlf,delbs,Type [C-a] [C-h] to see available commandsTerminal ready
This setup will now get us ready to connect to a GDB server. We can now run avr-gdb
in the third terminal.
$ avr-gdb ./jumpy.bin (gdb) target remote localhost:1234 Remote debugging using localhost:1234 0x00000000 in ?? () (gdb) break *($pc + 0x000000aa) Breakpoint 1 at 0xaa (gdb) c Continuing.Breakpoint 1 hit, 0x000000aa in ?? ()
Looking at picocom, it doesn’t look like we have hit any sort of text output yet, so we haven’t yet reached the point where the code is sending data through the USART terminal. Using the gdb command stepi 20
is a good trial and error technique to see how much we should step in the beginning before the first letter is output to the pico terminal. Interestingly, we hit the breakpoint again at 0xaa
, and if we switch back to picocom, we can see the letters In
being printed on the screen. This function looks like it is responsible for sending data over TX. Tracing back into Ghidra, it looks like we are hitting this peripheral component:
undefined USART2_RX() undefined Wlo:1 <RETURN> undefined1[255] Stack[-0x100 buf USART2_RX XREF[1]: Entry Point(*) code:000066 1f 92 push R1 code:000067 cd b7 in Ylo,SPL
After stepping a few more times, we come to a point where we can find directly what is printing to the picocom terminal.
(gdb) x/10i $pc => 0xd0: st Z, r18 0xd2: pop r0 0xd4: pop r29 0xd6: pop r28 0xd8: ret 0xda: push r28 0xdc: push r29 0xde: in r28, 0x3d ; 61 0xe0: in r29, 0x3e ; 62 0xe2: ldi r24, 0xC5 ; 197 (gdb) i r $r18 r18 0x75 11
Hex 0x75
is ‘u’
which makes sense if it’s printing the string Input
. Whats happening here in the code is the register r18
holds the next byte to store into Z
using the st
instruction. Looking at the Atmel instruction manual , we can see exactly what this operation is doing. It indirectly stores from the register to the data space using index X
or Z
. If you look how Ghidra decompiles this logic, it will look like this:
undefined2 USART3_TX(byte param_1) { ... R18 = *(undefined *)(Y + 1); write_volatile_1(UDR0,R18); }
Y
is the array index that holds our string table, and the st
instruction is seen as a volatile write to the DR0
(Data/Debug Register). Now we can make note that in GDB offset 0xaa
( 0x66
in Ghidra) is part of the USART-RX, and offset 0xbe
( 0x70
in Ghidra) is responsible for the USART-TX. Side Note: Some register pairs can be used for 16-bit operations. These can only be used with the register pairs R26:R27 (X)
, R28:R29 (Y)
, and R30:R31 (Z)
. I’m clarifying this because there was some confusion around debugging information in memory; so if we wanted to dump contents of Y
using GDB, the command would be crafted like this -> x/10x $R28
.
Now that the TX/RX interrupts are located, we need to be able to find where our input is actually processed. After stepping through a bunch of the code, it looks like we may have found in Ghidra where our input gets processed:
code:000171 8a 30 cpi r24,0xa code:000172 99 f0 brbs LAB_code_000186,Zflg code:000173 8b 81 ldd r24,Y+0x3 code:000174 8d 30 cpi r24,0xd
CPI
is responsible for comparing an immediate value. What sparked my interest when looking at this was the value 0xd
and 0xa
which is \r
and \n
in ascii. My theory is, if we set a breakpoint at the start of this function which is offset 0x2ac
( 0x167
in Ghidra), the picocom terminal will prompt us for an input and allow us to enter a string. After hitting enter, we should hit the breakpoint.
/* GDB Window */ (gdb) break *($pc + 0x000002ac) Breakpoint 1 at 0x2ac (gdb) c Continuing. Note: automatically using hardware breakpoints for read-only addresses.Breakpoint 1, 0x000002ac in ?? () (gdb)/* Pico Terminal */ Terminal ready Input:
It looks like the breakpoint was hit without giving us the opportunity to input a string. Let’s continue browsing the stack:
process_input code:000167 1f 92 push R1 code:000168 cd b7 in Ylo,SPL code:000169 de b7 in Yhi,SPH code:00016a 1a 82 std Y+0x2,R1 code:00016b 19 82 std Y+0x1,R1 code:00016c 15 c0 rjmp LAB_code_000182 LAB_code_00016d code:00016d 0e 94 54 01 call FUN_code_000154
From where we set our breakpoint, it looks like there is a function call at offset 0x154
. Decompiling this function, we get the output:
uint FUN_code_000154(void) { byte bVar1; do { W._0_1_ = uart_read_char(); } while ((char)W == '\0'); bVar1 = read_volatile_1(UDR0); W = (uint)bVar1; Z = 0xc6; return W; }
Subroutine looks simple enough to understand… it sets up the W
register to interpret bits from the USART status and control register, if successful, it then takes a byte at a time and reads from the USART data register. Until a null terminating character is found, W
will hold the entire value. Knowing this, we have to set a breakpoint after this function call, which will be at offset 0x2bc
( 0x15b
in Ghidra). Let’s try this again, set a breakpoint at the new offset, and see if it works this time.
/* GDB Window */ (gdb) break *($pc + 0x000002bc) Breakpoint 1 at 0x2bc (gdb) c Continuing. Note: automatically using hardware breakpoints for read-only addresses./* Pico Terminal */ Terminal ready Input:helloworld/* Breakpoint hit in GDB Window after hitting enter */ Breakpoint 1, 0x000002bc in ?? () (gdb)
Perfect! From what we saw before $r24
holds each byte and compares it to an \n
or an \r
. To confirm we can see if the register holds our first byte ‘h’
, (gdb) i r $r24 == 0x68 <=> 104
; and it does! At this offset, we will be hitting this breakpoint X
amount of times, X
being equal to the size of our string + 1
. Once the code processes the null terminating string, the function exits. Our input string will be held at this address below.
(gdb) x/1s 0x80013e 0x80013e: "helloworld"
Time to step out of this function and see where this routine is being cross-referenced. Once we reach the return instruction, the $pc
will be redirected to a module at:
undefined FUN_code_000192() undefined Wlo:1 <RETURN> undefined1[256] Stack[-0x100] buffer FUN_code_000192 code:000192 cf 93 push Ylo code:000193 df 93 push Yhi code:000194 1f 92 push R1 code:000195 cd b7 in Ylo,SPL code:000196 de b7 in Yhi,SPH code:000197 19 82 std Y+0x1,R1
This could be the winning ticket for what happens next with our string. Just to keep track, we are at offset 0x302
( 0x192
in Ghidra). I started to step through this function one instruction at a time, and it looks like it wasn’t actually comparing anything of interest once this function returned. Something interesting started to show once we got a bit further down the code base:
code:0001c7 82 0f add Wlo,R18 code:0001c8 93 1f adc Whi,R19 code:0001c9 83 3d cpi Wlo,0xd3 code:0001ca 91 05 cpc Whi,R1 code:0001cb 51 f4 brbc LAB_code_0001d6,Zflg
Looks like two registers are being added then compared to a value of 0xd3
. Let’s set a breakpoint here in gdb at offset 0x36c
, and see what values of our input are possibly being added together.
/* GDB Window */ (gdb) break *($pc + 0x0000036c) Breakpoint 1 at 0x36c (gdb) c Continuing. Note: automatically using hardware breakpoints for read-only addresses./* Pico Terminal */ Terminal ready Input:helloworld923/* Breakpoint hit in GDB Window after hitting enter */ Breakpoint 1, 0x0000036c in ?? () (gdb) x/3i $pc => 0x36c: add r24, r18 0x36e: adc r25, r19 0x370: cpi r24, 0xD3 ; 211 (gdb) i r $r24 r24 0x6c 108 (gdb) i r $r18 r18 0x72 114
We can see index 7 ( ‘r’
) and index 8 ( ‘l’
) added; this will give us the hexadecimal value of 0xde
. Now that we know this routine begins at 0x1bb
in Ghidra, we can rename this function compare_index_7_8()
. As we move further down the code base, we start to see a lot of similar code. Side note: start labeling all comparative functions to keep track of what indices your input string are being are being compared to.
/* Comparing Index 9 and 10 */ code:00021f 82 0f add Wlo,R18 code:000220 93 1f adc Whi,R19 code:000221 8f 38 cpi Wlo,0x8f/* Comparing Index 11 and 12 */ code:0002fa 82 0f add Wlo,R18 code:0002fb 93 1f adc Whi,R19 code:0002fc 80 3a cpi Wlo,0xa0/* Comparing Index 6 and 7 */ code:0002b5 52 9f mul R21,R18 code:0002b6 90 0d add Whi,R0 code:0002b7 11 24 eor R1,R1 code:0002b8 8c 30 cpi Wlo,0xc code:0002b9 9b 42 sbci Whi,0x2b code:0002ba 51 f4 brbc LAB_code_0002c5,Zflg
Over at address 0x2b5
, there is an operation that looks a bit different from the other comparisons. Index 6 and 7 of our input string gets their ordinals multiplied by each other, then gets compared to a lower WORD byte value of the result to 0xc
and higher byte value of that WORD to 0x2b
. In our example, we can take the two indices of ‘o’
and ‘r’
from helloworld923
and multiply the two integers to obtain the value 0x316e
; the lower byte of this WORD ( LOWORD
) in this equation is 0x6e
and the higher byte ( HIWORD
) value is 0x31
. This is how the comparison will look like in pseudo-code:
(ord(‘o’) * ord(‘r’)) >> 8 == 0x2b
&&
(ord(‘o’) * ord(‘r’)) & 0xFF == 0x0c
The sbci
instruction subtracts with the carry immediate command, and our final value that index 6 and 7 will need to equal is 0x2b0c
. Static analysis will really take care of the rest from here on out because we know what specific mnemonics to keep an eye out for. We can build up our constraints table and use the Z3 theorem to solve the rest. The size of our string has to be 13 characters as seen in this snippet of decompiled code:
if (var_37 == 0xd) { W = (char *)(CONCAT11(DAT_mem_013f,DAT_mem_013e) | 1); input_string = (char)W; }
Inside your Ghidra Functions tab, you should have labeled all 13 subroutines responsible for comparing each byte of your input.
Z3 Theorem Solution
To put it simply, Z3 Prover was created for Satisfiability Modulo Theories ( SMT ) problems. These are decision problems for logical formulas with combinations that include arithmetics, bit-vectors, arrays, and unresolved functions. Z3 is an effective SMT solver with specialized algorithms to solve background theories as mentioned above. You can download the python library here .
At this point we can formulate the conditionals on how this firmware code checks for the right password and what constraints we can enter into our bit-vector that can be used as shorthands for sub-terms.
if string[7] + string[8] == 0xd3 ⇒ if string[8] * string[9] == 0x15c0 ⇒ if string[0] * string[1] == 0x13b7 ⇒ if string[2] * string[3] == 0x1782 ⇒ if string[3] + string[4] == 0x92 ⇒ if string[6] * string[7] == 0x2b0c ⇒ if string[5] + string[6] == 0xa5 ⇒ if string[9] + string[10] == 0x8f ⇒ if string[1] + string[2] == 0xa7 ⇒ if string[10] * string[11] == 0x2873 ⇒ if string[12] * 13 == 0x297 ⇒ ... -> decrypt_flag()
Side Note:The flag string gets loaded at Ghidra’s offset 0x55
. It will use the instruction lpm
to load a data byte from the FLASH program memory into the register file. The Z-register in the register file is then used to access the program memory and place the data in register R0
. A final check function will be called at offset 0x3a7
to confirm how many times the variable isCompared
was set to be true | 1
. If equal to (1<<14)–1
, then the code will proceed to print the flag.
LAB_code_000051: code:000051 lpm R0,Z=>s__FLAG:_code_0006e4+ = "\r\nFLAG:" ... code:0003af 8f 3f cpi Wlo,0xff code:0003b0 9f 43 lsl Whi,0x0e code:0003b1 99 f4 brbc LAB_0003c5,Zflg ; not equal code:0003b2 80 e0 ldi Wlo,0x0 code:0003b3 91 e0 ldi Whi,0x1 code:0003b4 0e 94 aa 00 call usart_print ... LAB_0003c5: // if landed here, you'll get the "wrong password" msg code:0003c5 8b e0 ldi Wlo,0xb code:0003c6 91 e0 ldi Whi,0x1 code:0003c7 0e 94 aa 00 call usart_print
After all values are added to your constraints, you can use Z3’s check()
function to examine whether the assertions in the given solver plus the optional assumptions are consistent or not. If the assertion returns true, the next step would be to return a model()
for the last check()
. This function raises an exception if a model is not available (e.g., last check()
returned unsat
; your assignment doesn’t satisfy the quantified axiom). A simple test to see how the library works would be to use a three unknown linear equation from our challenge like x+y = 0xd3 && y*z = 0x15c0
:
>>> from z3 import * >>> x,y,z = Int('x'), Int('y'), Int('z')# Test with no solutions >>> s = SimpleSolver() >>> s.add(2**x == 4) >>> s.check() unknown >>> s.add(2**x == 16) >>> s.check() unsat# Reset general purpose solver with limited amount of preprocessing >>> s = SimpleSolver() >>> s.add(x + y == 0xd3) >>> s.add(y * z == 0x15c0) >>> s.check() sat >>> s.model() [z = 348, y = 16, x = 195]
We can apply the same logic to our unknowns mentioned above. But as you can see, if Z3 is unable to satisfy the requirements needed to model the solution, it will return an unknown and everything added after will be considered unsatisfiable.
After running our solver , we finally get the password that is needed to decrypt the flag:
ryancor@ryancor-VirtualBox:~$ python solve.py Found Password: g1v3_1t_t0_m3
The next step would be to program a module that takes this password, automatically sends it to the ATMega’s USART serial port, and retrieves the flag. The source code for this script can be found here . In order to do this, make sure to have your board plugged into the USB port and connected to your linux VM USB ports. To ensure your VM recognizes this device, check your kernel message logs right after its plugged in and you should see that the usb-core registered a new interface driver called ch341
:
ryancor@ryancor-VirtualBox:~$ dmesg [ 231.342712] VBOXGUEST_IOCTL_HGCM_CALL: 64 Failed. rc=-54 [ 1054.110148] usb 2-2: new full-speed USB device number 3 using [ 1054.447882] usb 2-2: New USB device found, idVendor=1a86, [ 1054.447885] usb 2-2: New USB device strings: Mfr=0, Product=2, [ 1054.447886] usb 2-2: Product: USB2.0-Serial [ 1054.477153] usbcore: registered new interface driver [ 1054.477430] usbserial: USB Serial support registered for generic [ 1054.479961] usbcore: registered new interface driver ch341 [ 1054.480080] usbserial: USB Serial support registered for ch341- [ 1054.480141] ch341 2-2:1.0: ch341-uart converter detected [ 1054.502669] usb 2-2: ch341-uart converter now attached to ttyUSB0
In my case, the port I’m going to connect to in my python script will be located at /dev/ttyUSB0
. After running our script, we get a successful decrypted flag from the microcontroller.
ryancor@ryancor-VirtualBox:~$ python solve.py [+] Found Password: g1v3_1t_t0_m3 [!] Sending input to driver...FLAG:D0_you_3ven_ROP?
Thank you for following along! I hope you enjoyed it as much as I did. If you have any questions on this article or where to find the challenge, please DM me at my Instagram: @hackersclub or Twitter: @ringoware
Happy Hunting :)
P.S. If you missed the link to my Z3 solver program above, you can find the source code here .
以上所述就是小编给大家介绍的《Using Z3 Theorem on AVR Firmware》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
信息学奥林匹克竞赛指导--组合数学的算法与程序设计PASCAL版/信息学奥林匹克竞赛指导丛书
林 生编 / 清华大学出版社 / 2002-8 / 19.00元
一起来看看 《信息学奥林匹克竞赛指导--组合数学的算法与程序设计PASCAL版/信息学奥林匹克竞赛指导丛书》 这本书的介绍吧!