Unicode exploits are basically the same as traditional stack buffer overflow exploits but it comes with a bigger challenge. The idea between both exploit types are the same; Overwrite EIP (or seh) with a useful address that would execute a command that jumps us back to our buffer that contains our code. Although their goals are the same, how you would go about achieving these goals differ.
The differnce that you would notice from the traditional ascii exploit and a unicode one is that every byte is appended with a null or 0x00 byte. For example, the string "DOG" in uppercase will be "44 4F 47" in ascii bytes. The unicode representation of the same string will be " 44 00 4F 00 47 00". So if when you overwrite a buffer with a crap load of AAAAAAAAA's, in a unicode exploit, each A in the buffer will be appened with a null. Therefore your buffer will look like this in bytes: "41 00 41 00 41 00 41 00 41 00 ...".
Usually with unicode exploits, you are quite limited in what memory addresses you can overwrite EIP or SEH with and you also have a limed instruction set in which you can use. You must accept that every byte in your supplied buffer will contain a trailing null byte and work your way from there. This also means that your buffer must contain code thats unicode compatible. For example, putting a short jump where the next seh address resides, i.e jmp 0x6, is common in seh exploits to jump over seh address towards your shellcode. This jmp 0x6 in bytes is " eb 06". If when send this in our buffer, the nulls will be appeneded to each byte before the code is run, i.e "eb 00 06 00" . If you look at the instructions that these bytes represents, its not what you would've intended it to be. This is a major point that must be kept in mind when dealing with unicode exploits.
You must be wondering how do we overcome the limitations discussed earlier. You basically use unicode compatible instructions to accomplish the same thing. These instructions include single byte instructions like push, pop, inc, dec and ret just to name a few. When using single byte instructions, each instruction must be seperated by some nop-equivalent code in the form of "00 nn 00" where nn will be an opcode that will give the effect of a nop instruction. There are not too many opcodes that we can use here. Some of them are 0x6E, 0x6F, 0x70, 0x71, 0x72, 0x73, 0x62 and 0x6D. These opcodes when used in the format "00 nn 00" will produce assembly instrunctions like "add byte ptr [ebp], ch". Replacing nn with one of the opcode bytes would produce something similar. For this to work however, the relevant register (ebp in our example) must contain an address which is writeable or else an exception will occur. Each opcode byte will normally result in giving you a different register at your disposal. Because the code that this produces probly would not affect our buffer (or shellcode), it can be used as filler or nop-like code in between single byte instructions and other relevant code pieces. If you need further elaboration of the uses on this, please read the unicode exploit over at corelan.be. They did a great job explaining this, but most importantly, they also walk you through developing an exploit using the above mentioned techniques.
Some things to keep in mind.
- After you found that you can overwrite eip or seh, you will need to find a usable unicode compatible address, i.e, in the form of 00nn00nn. So in the case of an seh exploit, you gonna need to find an address to a pop pop ret (like in a typical seh exploit) but this address must be in the format of 0x00nn00nn. The pvefindaddr plugging for immunity debugger can automate this process.
- Make use of single byte instructions like push, pop, inc, dec, and ret and seperate each with one of the nop-like opcodes i mentioned earlier (0x6D, 0x6E, 0x6F, 0x70, 0x71 etc. This will cause opcode to align itself in a way that is unicode compatible.
- Shellcode must be encoded with unicode compatible encoder. You can also use metasploit for this: # msfpayload windows/exec CMD=clac.exe R | msfencode -e x86/alpha_mixed -t raw | msfencode -e x86/unicode_upper -t raw BufferRegister=EAX
- Unicode encoders usually need you to have at least one register pointing to the begining of the shellcode. Here is an example of how this can be accomplished. Suppose we wanted to get the address of 0x00401030 into eax then jump to it. We can accomplish this like so:
B8 00110011 MOV EAX,11001100
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
2D 00010011 SUB EAX,11000100
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
50 PUSH EAX
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
4C DEC ESP
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
58 POP EAX
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
05 00300040 ADD EAX,40003000
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
50 PUSH EAX
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
44 INC ESP
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
58 POP EAX
006D 00 ADD BYTE PTR SS:[EBP],CH //Filler / Nop-like code
C3 RETN
If we were to see this as a stream of bytes, it would look like "B8 00 11 00 11 00 6D 00 2D 00 01 00 11 00 6D 00 50 00 6D 00 4C 00 6D 00 58 00 6D 00 05 00 30 00 40 00 6D
00 50 00 6D 00 44 00 6D 00 58 00 6D 00 C3"
This is unicode compatible code, often known as venetian code. Remember when you are writing your exploit, you will not be including the null bytes. These would get automatically inserted for you when your exploit overflows the buffer.