reverse-ex Challenge from Coursera's Malicious Software Course
█▇▅▆▃▅▆█▇▅▄▃▂▁▃▆▂█▂▇▂▆▁▄▄▆▁▆▇▁▇▅▂▄▅▁▇▄▆▆██▇▅█▁▇▆▂▃▅▇█▂▂▅▁▂▁▁▂█▇▅
« 📅 published on 24/Aug/2013
»
Introduction
The recently concluded Malicious Software and its Undergound Economy course on Coursera had an interesting reverse engineering challenge as part of its coursework: reverse-ex. I found it interesting since it will be equally entertaining for someone who has never tried such challenges or one who has mastered many of them. This post details the steps I took to complete this challenge.
Program Analysis and Testing
The first thing I normally do after obtaining such challenges is to test them with file
command. This provides some insight about various file attributes like its type (ELF, PE, etc.), processor architecture compatibility, symbol table inclusion, etc:
$ file reverse-ex
reverse-ex: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[ sha1]=0x8dac70a14b40f115fc4a27041a3ae29227a55afb, not stripped
$
$ file -i reverse-ex
reverse-ex: application/x-executable; charset=binary
The above output tells us that this is a 32bit ELF binary, dynamically linked, and has symbols included. This means that debugging and symbol resolution information has been preserved and packed inside the binary itself. This is a very important piece of information as it tells us that a successful disassembly of this binary will expose functions/variable that were used in its source, providing some pointers on how to solve this challenge. Before we move towards disassembly, let's run strings
over this file as well:
Now this output is even more interesting since it exposes quite significant details about the challenge. First, it didn't show any commonly used buffer overflow prone libc functions like gets
or strcpy
, but just puts
, printf
, read
, etc. There could be a format-string issue in the program (printf
-family functions are likely vulnerable if incorrectly used) but that can only be confirmed during static analysis of program disassembly or via dynamic analysis with multiple format-string prone inputs. But there's a bigger hint here: strings at the bottom following function names look interesting. Especially the KFFSE_XHKYOKXOHOFEDM^E_Y
string since it looks like some kind of a key. May be the program mutates user-supplied input and performs a comparison with static flags like this string. If it does, we need to look at its disassembly and figure out the logic behind the mutation function and try to reverse it. Let's use objdump
to disassemble this binary and dump its disassembly in Intel syntax:
$ objdump -d -M intel reverse-ex
reverse-ex: file format elf32-i386
Disassembly of section .text:
080484c4 <checkkey>:
80484c4: 55 push ebp
80484c5: 89 e5 mov ebp,esp
80484c7: 57 push edi
80484c8: 56 push esi
80484c9: 83 ec 10 sub esp,0x10
80484cc: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
80484cf: 89 45 f4 mov DWORD PTR [ebp-0xc],eax
80484d2: eb 14 jmp 80484e8 <checkkey+0x24>
80484d4: 8b 45 f4 mov eax,DWORD PTR [ebp-0xc]
80484d7: 0f b6 00 movzx eax,BYTE PTR [eax]
80484da: 89 c2 mov edx,eax
80484dc: 83 f2 2a xor edx,0x2a
80484df: 8b 45 f4 mov eax,DWORD PTR [ebp-0xc]
80484e2: 88 10 mov BYTE PTR [eax],dl
80484e4: 83 45 f4 01 add DWORD PTR [ebp-0xc],0x1
80484e8: 8b 45 f4 mov eax,DWORD PTR [ebp-0xc]
80484eb: 0f b6 00 movzx eax,BYTE PTR [eax]
80484ee: 84 c0 test al,al
80484f0: 75 e2 jne 80484d4 <checkkey+0x10>
80484f2: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
80484f5: 89 c2 mov edx,eax
80484f7: b8 70 87 04 08 mov eax,0x8048770
80484fc: b9 19 00 00 00 mov ecx,0x19
8048501: 89 d6 mov esi,edx
8048503: 89 c7 mov edi,eax
8048505: f3 a6 repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi]
8048507: 0f 97 c2 seta dl
804850a: 0f 92 c0 setb al
804850d: 89 d1 mov ecx,edx
804850f: 28 c1 sub cl,al
8048511: 89 c8 mov eax,ecx
8048513: 0f be c0 movsx eax,al
8048516: 85 c0 test eax,eax
8048518: 75 07 jne 8048521 <checkkey+0x5d>
804851a: b8 01 00 00 00 mov eax,0x1
804851f: eb 05 jmp 8048526 <checkkey+0x62>
8048521: b8 00 00 00 00 mov eax,0x0
8048526: 83 c4 10 add esp,0x10
8048529: 5e pop esi
804852a: 5f pop edi
804852b: 5d pop ebp
804852c: c3 ret
0804852d <foo>:
804852d: 55 push ebp
804852e: 89 e5 mov ebp,esp
8048530: 83 ec 18 sub esp,0x18
8048533: c7 04 24 89 87 04 08 mov DWORD PTR [esp],0x8048789
804853a: e8 91 fe ff ff call 80483d0 <puts@plt>
804853f: b8 00 00 00 00 mov eax,0x0
8048544: c9 leave
8048545: c3 ret
08048546 <bar>:
8048546: 55 push ebp
8048547: 89 e5 mov ebp,esp
8048549: 83 ec 18 sub esp,0x18
804854c: c7 04 24 8d 87 04 08 mov DWORD PTR [esp],0x804878d
8048553: e8 78 fe ff ff call 80483d0 <puts@plt>
8048558: b8 00 00 00 00 mov eax,0x0
804855d: c9 leave
804855e: c3 ret
0804855f <foobar>:
804855f: 55 push ebp
8048560: 89 e5 mov ebp,esp
8048562: 83 ec 18 sub esp,0x18
8048565: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
8048568: 89 04 24 mov DWORD PTR [esp],eax
804856b: e8 bd ff ff ff call 804852d <foo>
8048570: b8 00 00 00 00 mov eax,0x0
8048575: c9 leave
8048576: c3 ret
08048577 <main>:
8048577: 55 push ebp
8048578: 89 e5 mov ebp,esp
804857a: 57 push edi
804857b: 83 e4 f0 and esp,0xfffffff0
804857e: 81 ec 30 02 00 00 sub esp,0x230
8048584: b8 91 87 04 08 mov eax,0x8048791
8048589: 89 04 24 mov DWORD PTR [esp],eax
804858c: e8 1f fe ff ff call 80483b0 <printf@plt>
8048591: a1 40 a0 04 08 mov eax,ds:0x804a040
8048596: 89 04 24 mov DWORD PTR [esp],eax
8048599: e8 22 fe ff ff call 80483c0 <fflush@plt>
804859e: c7 44 24 08 ff 01 00 mov DWORD PTR [esp+0x8],0x1ff
80485a5: 00
80485a6: 8d 44 24 24 lea eax,[esp+0x24]
80485aa: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
80485ae: c7 04 24 00 00 00 00 mov DWORD PTR [esp],0x0
80485b5: e8 e6 fd ff ff call 80483a0 <read@plt>
80485ba: 8d 44 24 24 lea eax,[esp+0x24]
80485be: c7 44 24 1c ff ff ff mov DWORD PTR [esp+0x1c],0xffffffff
80485c5: ff
80485c6: 89 c2 mov edx,eax
80485c8: b8 00 00 00 00 mov eax,0x0
80485cd: 8b 4c 24 1c mov ecx,DWORD PTR [esp+0x1c]
80485d1: 89 d7 mov edi,edx
80485d3: f2 ae repnz scas al,BYTE PTR es:[edi]
80485d5: 89 c8 mov eax,ecx
80485d7: f7 d0 not eax
80485d9: 83 e8 01 sub eax,0x1
80485dc: 89 84 24 28 02 00 00 mov DWORD PTR [esp+0x228],eax
80485e3: 8b 84 24 28 02 00 00 mov eax,DWORD PTR [esp+0x228]
80485ea: 83 e8 01 sub eax,0x1
80485ed: 0f b6 44 04 24 movzx eax,BYTE PTR [esp+eax*1+0x24]
80485f2: 3c 0a cmp al,0xa
80485f4: 75 0f jne 8048605 <main+0x8e>
80485f6: 8b 84 24 28 02 00 00 mov eax,DWORD PTR [esp+0x228]
80485fd: 83 e8 01 sub eax,0x1
8048600: c6 44 04 24 00 mov BYTE PTR [esp+eax*1+0x24],0x0
8048605: 0f b6 44 24 24 movzx eax,BYTE PTR [esp+0x24]
804860a: 0f be c0 movsx eax,al
804860d: 83 f8 42 cmp eax,0x42
8048610: 74 17 je 8048629 <main+0xb2>
8048612: 83 f8 43 cmp eax,0x43
8048615: 74 1f je 8048636 <main+0xbf>
8048617: 83 f8 41 cmp eax,0x41
804861a: 75 27 jne 8048643 <main+0xcc>
804861c: c7 84 24 2c 02 00 00 mov DWORD PTR [esp+0x22c],0x804852d
8048623: 2d 85 04 08
8048627: eb 1c jmp 8048645 <main+0xce>
8048629: c7 84 24 2c 02 00 00 mov DWORD PTR [esp+0x22c],0x80484c4
8048630: c4 84 04 08
8048634: eb 0f jmp 8048645 <main+0xce>
8048636: c7 84 24 2c 02 00 00 mov DWORD PTR [esp+0x22c],0x804855f
804863d: 5f 85 04 08
8048641: eb 02 jmp 8048645 <main+0xce>
8048643: eb fe jmp 8048643 <main+0xcc>
8048645: 8d 44 24 24 lea eax,[esp+0x24]
8048649: 83 c0 01 add eax,0x1
804864c: 89 04 24 mov DWORD PTR [esp],eax
804864f: 8b 84 24 2c 02 00 00 mov eax,DWORD PTR [esp+0x22c]
8048656: ff d0 call eax
8048658: 89 84 24 24 02 00 00 mov DWORD PTR [esp+0x224],eax
804865f: 83 bc 24 24 02 00 00 cmp DWORD PTR [esp+0x224],0x0
8048666: 00
8048667: 74 1a je 8048683 <main+0x10c>
8048669: b8 af 87 04 08 mov eax,0x80487af
804866e: 8d 54 24 24 lea edx,[esp+0x24]
8048672: 83 c2 01 add edx,0x1
8048675: 89 54 24 04 mov DWORD PTR [esp+0x4],edx
8048679: 89 04 24 mov DWORD PTR [esp],eax
804867c: e8 2f fd ff ff call 80483b0 <printf@plt>
8048681: eb 0c jmp 804868f <main+0x118>
8048683: c7 04 24 be 87 04 08 mov DWORD PTR [esp],0x80487be
804868a: e8 41 fd ff ff call 80483d0 <puts@plt>
804868f: 8b 84 24 24 02 00 00 mov eax,DWORD PTR [esp+0x224]
8048696: 89 04 24 mov DWORD PTR [esp],eax
8048699: e8 52 fd ff ff call 80483f0 <exit@plt>
804869e: 90 nop
804869f: 90 nop
I've removed certain non-important sections from the above output. Apart from main
, there are 4 functions, namely checkkey
, foo
, bar
, foobar
. The main
function includes certain anti-disassembly and obfuscation code and it's difficult to make proper sense of what's happening here. But let's focus on the important instructions. The cmp
set of instructions starting at 0x804860d are interesting, especially the first cmp
itself since it will cause a jump to 0x8048629 if the input buffer starts with 0x42 (B in ascii) and 0x8048629 is from where the checkkey
function starts. Of all the available functions, checkkey
is of obvious interest and this tells us that the program will only transfer control to checkkey
function if the input starts with ascii B
and not otherwise. Once the control reaches checkkey
it will try to mutate input buffer and compare it against a static key. Let's conclude this analysis by having a closer look at checkkey
's disassembly:
080484c4 <checkkey>:
80484c4: 55 push ebp
80484c5: 89 e5 mov ebp,esp
80484c7: 57 push edi
80484c8: 56 push esi
80484c9: 83 ec 10 sub esp,0x10
80484cc: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
80484cf: 89 45 f4 mov DWORD PTR [ebp-0xc],eax
80484d2: eb 14 jmp 80484e8 <checkkey+0x24>
80484d4: 8b 45 f4 mov eax,DWORD PTR [ebp-0xc]
80484d7: 0f b6 00 movzx eax,BYTE PTR [eax]
80484da: 89 c2 mov edx,eax
80484dc: 83 f2 2a xor edx,0x2a
80484df: 8b 45 f4 mov eax,DWORD PTR [ebp-0xc]
80484e2: 88 10 mov BYTE PTR [eax],dl
80484e4: 83 45 f4 01 add DWORD PTR [ebp-0xc],0x1
80484e8: 8b 45 f4 mov eax,DWORD PTR [ebp-0xc]
80484eb: 0f b6 00 movzx eax,BYTE PTR [eax]
80484ee: 84 c0 test al,al
80484f0: 75 e2 jne 80484d4 <checkkey+0x10>
80484f2: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
80484f5: 89 c2 mov edx,eax
80484f7: b8 70 87 04 08 mov eax,0x8048770
80484fc: b9 19 00 00 00 mov ecx,0x19
8048501: 89 d6 mov esi,edx
8048503: 89 c7 mov edi,eax
8048505: f3 a6 repz cmps BYTE PTR ds:[esi],BYTE PTR es:[edi]
8048507: 0f 97 c2 seta dl
804850a: 0f 92 c0 setb al
804850d: 89 d1 mov ecx,edx
804850f: 28 c1 sub cl,al
8048511: 89 c8 mov eax,ecx
8048513: 0f be c0 movsx eax,al
8048516: 85 c0 test eax,eax
8048518: 75 07 jne 8048521 <checkkey+0x5d>
804851a: b8 01 00 00 00 mov eax,0x1
804851f: eb 05 jmp 8048526 <checkkey+0x62>
8048521: b8 00 00 00 00 mov eax,0x0
8048526: 83 c4 10 add esp,0x10
8048529: 5e pop esi
804852a: 5f pop edi
804852b: 5d pop ebp
804852c: c3 ret
After the function prologue, the checkkey
function is reading the input buffer (starting address at DWORD PTR [ebp+0x8]
) and performs a per-byte XOR using 0x2a as the key. Once the XOR operation completes, it is indeed comparing the resultant string with a pre-defined flag of size 0x19 - 1
bytes stored at 0x8048770. The mov
instructions starting at address 0x80484f2 and the repz cmps
instruction at address 0x8048505 confirm this analysis.
So we now know that the input will be XOR'ed with a static per-byte key of 0x2a and the result will be compared with a flag defined at 0x80484f2. To reverse this mutation logic the only remaining piece of information is the flag. If we obtain the flag and reverse the XOR operation, we'll have a string that will be accepted by the program for the challenge to complete. From the strings
output earlier, we found a static key-like string, KFFSE_XHKYOKXOHOFEDM^E_Y
, which looks like a very good candidate. Let's try to reverse the XOR operation using this string as the key and 0x2a as XOR key:
#!/usr//bin/env python
import sys
if len(sys.argv) != 3:
print "USAGE: %s <KFFSE_XHKYOKXOHOFEDM^E_Y> <0x2a>" % (sys.argv[0])
sys.exit(1)
flag = sys.argv[1]
key = int(sys.argv[2], 16)
decoded = []
for c in flag:
decoded.append(chr(ord(c) ^ key))
decoded = "".join(decoded)
print "%s ^ 0x%x => %s" % (flag, key, decoded)
$ ./xor.py KFFSE_XHKYOKXOHOFEDM^E_Y 0x2a
KFFSE_XHKYOKXOHOFEDM^E_Y ^ 0x2a => allyourbasearebelongtous
$
$ ./reverse-ex
Are you feeling lucky today? Ballyourbasearebelongtous
[+] WooT!: KFFSE_XHKYOKXOHOFEDM^E_Y
Awesome! First run of the program and we get a WooT! message confirming successful completion of this challenge. It can't get better than this.
Every assumption we made (B
at the start of input, mutation logic based on a static key 0x2a, etc.) about the program was validated via static analysis except for the flag string used above. We assumed it to be the XOR result comparison flag stored at address 0x8048505 but didn't validate it in anyway except for trying it out with the binary itself. Authors of such challenges often add such strings to confuse the analyst and delay the process. However, validating this assumption is fairly trivial and let's see how to do it. Since, the value stored at this address could be static or dynamically-computed at runtime, let's use rely on the debugging process to provide us the correct result. We'll debug the program using GDB and check the contents of this address to validate our assumption:
$ gdb ./reverse-ex -q
Reading symbols from /home/ankur/toolbox/challenges/misc/reverse-ex...(no debugging symbols found)...done.
(gdb)
(gdb) break checkkey
Breakpoint 1 at 0x80484c9
(gdb)
(gdb) r
Starting program: /home/ankur/toolbox/challenges/misc/reverse-ex
Are you feeling lucky today? BABCD
Breakpoint 1, 0x080484c9 in checkkey ()
(gdb)
(gdb) disas checkkey
Dump of assembler code for function checkkey:
0x080484c4 <+0>: push %ebp
0x080484c5 <+1>: mov %esp,%ebp
0x080484c7 <+3>: push %edi
0x080484c8 <+4>: push %esi
=> 0x080484c9 <+5>: sub $0x10,%esp
0x080484cc <+8>: mov 0x8(%ebp),%eax
0x080484cf <+11>: mov %eax,-0xc(%ebp)
0x080484d2 <+14>: jmp 0x80484e8 <checkkey+36>
0x080484d4 <+16>: mov -0xc(%ebp),%eax
0x080484d7 <+19>: movzbl (%eax),%eax
0x080484da <+22>: mov %eax,%edx
0x080484dc <+24>: xor $0x2a,%edx
0x080484df <+27>: mov -0xc(%ebp),%eax
0x080484e2 <+30>: mov %dl,(%eax)
0x080484e4 <+32>: addl $0x1,-0xc(%ebp)
0x080484e8 <+36>: mov -0xc(%ebp),%eax
0x080484eb <+39>: movzbl (%eax),%eax
0x080484ee <+42>: test %al,%al
0x080484f0 <+44>: jne 0x80484d4 <checkkey+16>
0x080484f2 <+46>: mov 0x8(%ebp),%eax
0x080484f5 <+49>: mov %eax,%edx
0x080484f7 <+51>: mov $0x8048770,%eax
0x080484fc <+56>: mov $0x19,%ecx
0x08048501 <+61>: mov %edx,%esi
0x08048503 <+63>: mov %eax,%edi
0x08048505 <+65>: repz cmpsb %es:(%edi),%ds:(%esi)
0x08048507 <+67>: seta %dl
0x0804850a <+70>: setb %al
0x0804850d <+73>: mov %edx,%ecx
0x0804850f <+75>: sub %al,%cl
0x08048511 <+77>: mov %ecx,%eax
0x08048513 <+79>: movsbl %al,%eax
0x08048516 <+82>: test %eax,%eax
0x08048518 <+84>: jne 0x8048521 <checkkey+93>
0x0804851a <+86>: mov $0x1,%eax
0x0804851f <+91>: jmp 0x8048526 <checkkey+98>
0x08048521 <+93>: mov $0x0,%eax
0x08048526 <+98>: add $0x10,%esp
0x08048529 <+101>: pop %esi
0x0804852a <+102>: pop %edi
0x0804852b <+103>: pop %ebp
0x0804852c <+104>: ret
End of assembler dump.
(gdb) x /s 0x8048770
0x8048770: "KFFSE_XHKYOKXOHOFEDM^E_Y"
(gdb) q
A debugging session is active.
Inferior 1 [process 13283] will be killed.
Quit anyway? (y or n) y
Conclusion
We add a breakpoint at checkkey
function and manually inspect the content of the 0x8048770 address (it might change on our system) and it indeed turns out to be KFFSE_XHKYOKXOHOFEDM^E_Y
. This concludes this writeup and in an upcoming post I'll write about the second challenge from this course.