« In the beginning... | Main | Can't get an iPhone? »

Reverse-Engineering in OS X on x86

The other day at work I had a task: to figure out how to change the displayed title of a minimized window in the Dock without actually changing the window's title. (Please trust that I had a very good reason for wanting to do this.)

While there are some excellent articles about how to reverse-engineer under OS X, they're all PowerPC-based. And even though the future of the Mac is x86, it seems like people have lots of anxiety about having to work with it.

I think the problem is not a lack of documentation on x86 assembly, but a surfeit of it. Most of it is Windows- or DOS-centric, usually with the wrong syntax (Intel syntax vs the AT&T syntax that GCC uses), and with the aim of teaching how to write it. But reading x86 assembly really isn't that hard. If all you want to do is learn how to read the code generated by GCC, it's probably just as easy as PowerPC.

The other day I took notes of my discoveries. Let's touch on two functions, both in PowerPC and x86 flavors. For those of you who only know PowerPC assembly, I hope you'll be pleasantly surprised.

Before we begin, I'm going to assume that you're comfortable with assembly in general (though not necessarily with any particular one). If you have the latest developer tools, launch Shark (in /Developer/Applications/Performance Tools) and in the Help menu you can access various ISA references. In addition, Apple has ABI documentation for both the PowerPC and x86. I'm going to go over each function twice (once for PowerPC and once for x86); feel free to skim the PowerPC version if you're accustomed to it. And finally, this is only for the 32-bit version of each platform; things change even more with 64 bits.

SetWindowTitleWithCFString

The trail always begins with a public call that uses the SPI that you want to figure out. In this case, I chose SetWindowTitleWithCFString because it has to somehow set the title of a window even if it's minimized. I went with Carbon because sometimes the dynamic nature of Objective C with Cocoa makes tracing code harder.

PowerPC

<+0>:	mflr    r0          // save linkage
<+4>:	stmw    r30,-8(r1)  // stash r30, r31
<+8>:	mr      r30,r4      // save r4 (new title)
<+12>:	stw     r0,8(r1)    // make stack frame
<+16>:	stwu    r1,-80(r1)  // make stack frame

This is the prologue of the function. The PowerPC doesn't have a dedicated stack pointer (convention is to use r1 for that), so the common way of implementing branches by pushing the PC onto the stack doesn't work. Instead, the PowerPC has a link register and a command bl to branch and put the old PC value into the link register. Thus, almost every function starts with mflr r0, to pull the old PC into a usable register. Then in <+4> we save off some registers that we're going to smash. Every function needs scratch registers to hold local variables, and usually the high-numbered registers are used. The stmw (store multiple words) instruction is useful for ditching many high registers on the stack. Then in <+12> we drop the old PC onto the stack and allocate 80 bytes on the stack.

A note on parameter passing. Integer-sized parameters (the only kind we'll be dealing with today) are passed into a function starting with r3 and going up through the registers. Return values are returned in r3. So we see that in <+8> we stick away the pointer to the new name in r30 (whose previous value was stored on the stack earlier).

<+20>:	bl      0x92881384 <_Z13GetWindowDataP15OpaqueWindowPtr>
<+24>:	li      r0,-5600    // errInvalidWindowRef
<+28>:	cmpwi   cr7,r3,0    // if no window data, bail
<+32>:	beq-    cr7,0x928d2ae0 <+60>
<+36>:	cmpwi   cr7,r30,0   // if no string to set, bail
<+40>:	li      r0,-50      // paramErr
<+44>:	beq-    cr7,0x928d2ae0 <+60>
<+48>:	mr      r4,r30

This is where we must start making inferences as to what the code is doing. Fortunately, we have the symbols so it's not too hard. We see that we use the WindowRef as a parameter to a C++ function GetWindowData(OpaqueWindowPtr), as the WindowRef was passed in as r3 and r3 wasn't altered before the call. In addition, note that the function return value, being in r3, will overwrite the WindowRef value which wasn't saved in a high register. That's fine, as the WindowRef was just an index into a table and won't be needed further.

At this point we run some checks. We compare both r3 and r30 to zero and if either are we jump to the end with r0 set to the appropriate error code. (The end of the function will move r0 into r3 for return.)

The PowerPC condition register has eight condition sets. Why are we using cr7 here? Probably because cr7 is volatile and we can get away with not saving/restoring it.

<+52>:	bl      0x928d2af8 <_ZN10WindowData14SetTitleCommonEPK10__CFString>
<+56>:	li      r0,0        // return noErr
<+60>:	addi    r1,r1,80    // tear down stack frame and return
<+64>:	mr      r3,r0
<+68>:	lwz     r0,8(r1)
<+72>:	lmw     r30,-8(r1)
<+76>:	mtlr    r0
<+80>:	blr

The rest is pretty simple. We call a member function WindowData::SetTitleCommon(CFString*), and then do common tear down. We restore the stack pointer, put the return value into r3, restore the registers, move the old PC back into the link register, and branch to the link register (blr), returning us to our caller.

x86

The PowerPC register file is really easy: r0, r1, r2 ... r31. x86 has fewer registers and they've historically had different roles (accumulator, base, source index, destination index, and so on). Seriously, forget about that. There are eight registers you care about. eax, ebx, ecx, edx, esi, and edi are all general-purpose registers. esp is the stack pointer. ebp is the frame pointer. That's it.

PowerPC assembly reads right-to-left (except for stores). x86 AT&T syntax in general reads left to right.

<+0>:	push   %ebp             // make stack frame
<+1>:	mov    %esp,%ebp        // make stack frame
<+3>:	push   %esi             // stash %esi
<+4>:	sub    $0x14,%esp       // make stack frame

x86 is stack-based. Parameters to a function are put at the top of the stack, with the rightmost parameters with the highest addresses. To execute the function, the call instruction was used. It pushes the PC onto the stack, so even before we hit <+0> the parameters are four bytes above the stack pointer. In <+0> we save off the old stack frame value and in <+1> we establish our stack frame. At this point ebp is fixed for the entire function. In <+3> we save the old values of registers we're going to use, and in <+4> we allocate space on the stack.

This is a perfect example of an ideal stack frame. ebp is the frame pointer. It points (to the stack) at the old frame pointer. ebp+4 is the IP of the function that called us. ebp+8 is the first parameter passed in, ebp+12 is the second, etc. Immediately below ebp are the values saved from the registers, which will be restored before the return. And below that is a bunch of stack space used for either register spillage or calling subsequent functions. One interesting note is that rarely are parameters pushed onto the stack for a call. The stack pointer doesn't move once we make it past the prologue. We just set the memory right above esp (the stack pointer) and make the call.

<+7>:	mov    0x8(%ebp),%eax   // get WindowRef in %eax
<+10>:	mov    0xc(%ebp),%esi   // get new title in %esi

The parameters are passed on the stack. Since fiddling in memory is slow, we pull the values into registers. It's actually pretty analogous to how things go in PowerPC. There, lower registers like r3 are reused for parameter passing so important values are kept in the high registers. On x86 the parameters go on the stack and values are kept in registers (while they can). Why eax and esi? Why not?

<+13>:	mov    %eax,(%esp)      // put WindowRef on the stack
<+16>:	call   0x92dfb8f6 <_Z13GetWindowDataP15OpaqueWindowPtr>

With the PowerPC, you can tell how many parameters a function has by how many registers starting with r3 are loaded. Here, just look at the register indirect addressing with esp.

<+21>:	mov    %eax,%edx        // stick WindowData into %edx
<+23>:	mov    $0xffffea20,%eax // errInvalidWindowRef
<+28>:	test   %edx,%edx        // if no window data, bail
<+30>:	je     0x92e4bb04 <+54>
<+32>:	test   %esi,%esi        // if no string to set, bail
<+34>:	mov    $0xffce,%ax      // paramErr
<+38>:	je     0x92e4bb04 <+54>

Return values come back from functions in eax, but otherwise this is pretty much the same. The only thing of interest to note is the clever use of the peculiar register structure. In <+23> the constant 0xffffea20 is loaded into eax. But on <+34> the constant 0xffce is loaded in ax. But since ax is just an alias for the lower 16 bits of eax, the upper half of the word is left as 0xffff and we get the full constant 0xffffffce in eax. Why do this? Because loading a 32 bit constant takes 5 bytes while loading a 16 bit constant only takes 4.

<+40>:	mov    %esi,0x4(%esp)   // load new title as param 2
<+44>:	mov    %edx,(%esp)      // load WindowData as param 1
<+47>:	call   0x92e4bb0c <_ZN10WindowData14SetTitleCommonEPK10__CFString>
<+52>:	xor    %eax,%eax        // return noErr

Same stuff as before. The one note is the zeroing of eax with an xor. Just a fancy trick as the generated code is faster and smaller than the equivalent mov $0x0,%eax.

<+54>:	add    $0x14,%esp       // tear down stack frame and return
<+57>:	pop    %esi
<+58>:	leave  
<+59>:	ret    
<+60>:	nop    
<+61>:	nop    

Mirror image of the stack frame creation.

UpdateDockTitle

That wasn't so hard, was it? Whether stack- or register-based, it's basically the same.

At this point I'd like to talk about UpdateDockTitle. There are a few tricks that are in here, and so I'll focus the commentary on those more.

PowerPC

<+0>:	mflr    r0               // save linkage
<+4>:	stmw    r28,-16(r1)      // stash r28, r29, r30, r31
<+8>:	mr      r30,r3           // save r3 (WindowData)
<+12>:	bcl-    20,4*cr7+so,0x928d2bd4 <+16>
<+16>:	mflr    r31            // get ip in r31

Whoa... what?

Short story: <+12> is an unconditional branch-and-link.

Long story: On the PowerPC, instructions like bge, etc. are just aliases to a more primitive branch instruction, bc (branch conditional). In this case, the first parameter is 20 (0b10100), which indicates “branch always”. Since it's always going to branch, the second parameter doesn't matter, so it was set to all 1 bits (which translates to 4*cr7+so).

Why do this? Because we're going to need to access some PC-relative data, and the PowerPC chip has no PC-relative addressing mode. And the register move instructions can't access the PC register. Therefore we cheat in a way by taking an unconditional jump to the next address. Since it's a branch and link, the link register is filled with the next address (which in this case equals the address just jumped to) which can be moved to a normal register.

Why branch-conditional with a condition “branch always”? The b opcode only provides absolute addressing. Only bc has relative addressing.

<+20>:	stw     r0,8(r1)
<+24>:	stwu    r1,-80(r1)     // make stack frame
<+28>:	addis   r28,r31,3533 
<+32>:	bl      0x928d2c50 <_Z15GetTitleForDockP10WindowData>
<+36>:	lbz     r0,-3364(r28)  // haul initialization boolean into r0

This is where intuition comes in. We're hauling in some random byte from some PC-relative address. (lbz is load byte and zero, which loads one byte from memory and clears the high bits.) What's byte sized? A bool. Why a bool? Bools are flags. And with the value of the byte gating the call to RegisterAsDockClientPriv, it's a safe bet that it's an initialization flag.

<+40>:	mr      r29,r3         // stash new title into r29
<+44>:	cmpwi   cr7,r0,0       // was initialized?
<+48>:	bne-    cr7,0x928d2c04 <+64> // if so, skip
<+52>:	bl      0x9287f864 <_Z24RegisterAsDockClientPrivv> // else initialize
<+56>:	li      r0,1           // and set flag
<+60>:	stb     r0,-3364(r28)  //   as being intialized
<+64>:	mr      r3,r30
<+68>:	mr      r4,r29
<+72>:	bl      0x928d2c68 <SyncPlatformWindowTitle> // call with (WindowData, new title)
<+76>:	lwz     r0,344(r30)    // pull (WindowData + 344)
<+80>:	andis.  r2,r0,64       // and pull a flag bit out of it (minimized?)

More intuition here. r30 contains a pointer to the WindowData class instance, and we're accessing some word 344 bytes in. We don't care about the destination register (we don't touch r2 again this function) but don't miss the name of the opcode: “andis.” Remember that the period means to update cr0.

Once again, this is obviously a flag (bit-sized this time). But what does it mean? Context tells us that we only call CoreDockSetItemTitle when it's set. Thus, guessing that it's the is-minimized flag is safe.

<+84>:	beq-    0x928d2c38 <+116> // if not minimized, skip this step
<+88>:	addi    r1,r1,80
<+92>:	lwz     r3,196(r30) // load WID

How do I know that WindowData+196 is the CoreGraphics WID? I used Quartz Debug to look at the window list for a sample app. The app only had one window, and the listed WID matched.

<+96>:	mr      r4,r29 // load new title
<+100>:	lwz     r0,8(r1)
<+104>:	lmw     r28,-16(r1) // tear down stack frame
<+108>:	mtlr    r0
<+112>:	b       0x92b58ce4 <dyld_stub_CoreDockSetItemTitle>

Note that we're tearing down the stack twice. In this case we're tail calling CoreDockSetItemTitle so that it's as if our caller called them directly. This is equivalent to the code return CoreDockSetItemTitle(wid, newTitle). Note from the setup of r3 and r4 that we can deduce the parameter types. Can we figure out the return type, though? Not really. The calling code ignores it, so we can ignore it too.

<+116>:	addi    r1,r1,80
<+120>:	li      r3,0
<+124>:	lwz     r0,8(r1)
<+128>:	lmw     r28,-16(r1)
<+132>:	mtlr    r0
<+136>:	blr

x86

<+0>:	push   %ebp                   // make stack frame
<+1>:	mov    %esp,%ebp
<+3>:	sub    $0x28,%esp
<+6>:	mov    %ebx,-0xc(%ebp)        // save %ebx
<+9>:	call   0x92e4bbe4 <+14>
<+14>:	pop    %ebx                 // IP > %ebx

We're doing the same trick here to get the PC into a register and I'm a bit stumped as to why. From what I know, the x86 has PC-relative addressing, and surely there's got to be a better way to get the PC into a normal register. Right?

<+15>:	mov    %esi,-0x8(%ebp)      // save %esi
<+18>:	mov    0x8(%ebp),%esi       // WindowData > %esi
<+21>:	mov    %edi,-0x4(%ebp)      // save %edi

This almost looks like it was compiled by a different compiler. In the previous function, edi and esi are pushed, and then the stack pointer dropped. Here, we create the stack space and then move the contents of three registers (edi, esi, and ebx). I suspect that things changes once we also have to save ebx, though I don't know why.

<+24>:	mov    %esi,%eax            // %esi (WindowData) > %eax
<+26>:	call   0x92e4bc40 <_Z15GetTitleForDockP10WindowData>

Whoa. If we're calling a function we need to set the parameter via stack-relative addressing off esp. What's going on here?

The point of an ABI is that it's a documented way for functions to call each other. But if a function, say GetTitleForDock(WindowData*), is a short one that's not public and is only used under controlled circumstances, why worry about setting up the stack? In this particular case, GetTitleForDock happens to be a nine-instruction routine. Not worth the hassle of a stack frame, so it's reasonable to pass in the one parameter in eax.

<+31>:	cmpb   $0x0,0xd51a36c(%ebx) // test initialization boolean
<+38>:	mov    %eax,%edi            // window title > %edi
<+40>:	jne    0x92e4bc0c <+54> // if initialized, skip
<+42>:	call   0x92df9fe0 <_Z24RegisterAsDockClientPrivv> // else initialize
<+47>:	movb   $0x1,0xd51a36c(%ebx) // and set flag as being initialized
<+54>:	mov    %edi,0x4(%esp)       // new title (param 2)
<+58>:	mov    %esi,(%esp)          // WindowData (param 1)
<+61>:	call   0x92e4bc52 <SyncPlatformWindowTitle>
<+66>:	xor    %eax,%eax            // clear %eax (noErr?)
<+68>:	testb  $0x2,0x159(%esi)     // test flag (WindowData + 0x159) (minimized?)
<+75>:	je     0x92e4bc35 <+95> // if not minimized, skip this step
<+77>:	mov    %edi,0x4(%esp)       // new title (param 2)
<+81>:	mov    0xc4(%esi),%eax      // (WindowData + 0xC4) WID
<+87>:	mov    %eax,(%esp)          // (param 1)
<+90>:	call   0xa0a52ad1 <dyld_stub_CoreDockSetItemTitle>
<+95>:	mov    -0xc(%ebp),%ebx
<+98>:	mov    -0x8(%ebp),%esi
<+101>:	mov    -0x4(%ebp),%edi
<+104>:	leave  
<+105>:	ret    

Conclusion

Yes, x86 assembly sucks. Having only two parameters rather than three for an opcode is a pain. Having only six general-purpose registers for use instead of twenty or so is a real pain.

But really, come on. You're not writing it. You're reading it.

It's compiler-generated. Nothing fancy.

Hold your horror. x86 isn't that bad.

TrackBack

TrackBack URL for this entry:
http://www.drissman.com/cgi-sys/cgiwrap/drissman/mt/mt-tb.cgi/882

Comments

"Endian little hate we."

On the plus side, the 64-bit ABI is much better, having a few more real registers to work with and the mess that is x87 is often avoided too.

Note that with gcc a bool on PPC is four bytes by default; the lbz is more likely dealing with a Boolean.

Ned: excellent point. Fixed for the Google Mac Blog copy of the article. Thanks!

Hello Avi, could you help me with some CoreDock hacking? Is there an email address that I can contact you at? Thanks!

Hi there, just became aware of your blog through
Google, and found that it is really informative.

I'm gonna watch out for brussels. I will be grateful if you continue this in future. Numerous people will be benefited from your writing. Cheers!

Hi, constantly i used to check website posts here in the early hours in the dawn, since i enjoy
to find out more and more.

Way cool! Some very valid points! I appreciate you writing this
write-up and also the rest of the website is also very good.

What's up colleagues, how is the whole thing, and what you would like to say concerning this post, in my view its truly
amazing in support of me.

Ηello theгe! I сould have sworn I've been to your blоg bеfore but after looking at a few of the
articles I realized it's neω to mе. Nonethеless, I'm definitely delighted I discоvered
it and I'll be boоk-marking it and cgecking baсk regularly!

The nectar solution can be created at home, by using a ratio of four years old parts water to
at least one part white cane sugar. However, there are many men who simply
would rather wear a simple tattoo on the neck. Hummingbird tattoos sneaking through low-cut
jeans is both mysterious and tempting.

What's up, yup this post is genuinely good and I have learned lot of things from it about blogging.

thanks.

Market your site content (blog) with your Google+ Circles
and in your email campaigns. Orkut is still equipped with millions of users,
mainly from Brazil and India, now increasing active users of
the platform or in comparison to Facebook.

The general concept of 'huddle' is usually to come closer or gather
together.

Post a comment