This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] ARM: Add optimized ARMv7 strcmp implementation
- From: Richard Henderson <rth at twiddle dot net>
- To: Will Newton <will dot newton at linaro dot org>, libc-alpha at sourceware dot org
- Cc: Richard Earnshaw <rearnsha at arm dot com>
- Date: Wed, 23 Apr 2014 09:13:24 -0700
- Subject: Re: [PATCH] ARM: Add optimized ARMv7 strcmp implementation
- Authentication-results: sourceware.org; auth=none
- References: <1398256338-24784-1-git-send-email-will dot newton at linaro dot org>
On 04/23/2014 05:32 AM, Will Newton wrote:
> + clz tmp1, \synd
> + lsl r1, \d2, tmp1
> + .if \restore_r6
> + ldrd r6, r7, [sp, #8]
> + .endif
> + cfi_restore (r6)
> + cfi_restore (r7)
> + lsl \d1, \d1, tmp1
> + cfi_remember_state
> + lsr result, \d1, #24
> + ldrd r4, r5, [sp], #16
> + cfi_restore (r4)
> + cfi_restore (r5)
> + sub result, result, r1, lsr #24
You can minimize the size of the unwind info (avoiding DW_CFA_advance opcodes)
by delaying restores until the previous storage may be clobbered, or until some
other adjustment must be made. The storage is clobbered, obviously, by the
post-increment of SP deallocating the storage.
> +#else
> + /* To use the big-endian trick we'd have to reverse all three words.
> + that's slower than this approach. */
> + rev \synd, \synd
> + clz tmp1, \synd
> + bic tmp1, tmp1, #7
> + lsr r1, \d2, tmp1
> + cfi_remember_state
> + .if \restore_r6
> + ldrd r6, r7, [sp, #8]
> + .endif
> + cfi_restore (r6)
> + cfi_restore (r7)
Also, it would appear that you're remembering two different states between the
big and little endian versions of this code. The LE version remembers before
restoring r6/r7; the BE version remembers afterward.
I suspect you want
ldrd r4, r5, [sp], #16
cfi_remember_state
cfi_def_cfa_offset(0)
cfi_restore (r4)
cfi_restore (r5)
cfi_restore (r6)
cfi_restore (r7)
which has a single advance within this code sequence, and r6/r7 remembered as
saved, and actually describes the stack adjustment.
> + strd r4, r5, [sp, #-16]!
> + cfi_def_cfa_offset (16)
> + cfi_offset (r4, -16)
> + cfi_offset (r5, -12)
> + orr tmp1, src1, src2
> + strd r6, r7, [sp, #8]
> + cfi_offset (r6, -8)
> + cfi_offset (r7, -4)
FWIW, I find using cfi_rel_offset much easier to reason with, since you get to
match up numbers in the cfi data with numbers in the assembly. E.g.
cfi_rel_offset(r6, 8)
cfi_rel_offset(r7, 12)
but perhaps not that helpful in this case.
> +.Lmisaligned_exit:
> + cfi_remember_state
> + mov result, tmp1
> + ldr r4, [sp], #16
> + cfi_restore (r4)
> + bx lr
Even though only r4 was modified and reloaded, r5-r7 are still described as
saved. After popping all of the storage, that's not correct -- you need to
restore all 4 registers. Again, you're not describing the stack adjustment.
Again, no point in doing the remember 2 insns before the restores.
Alternately, since this unwind info isn't for correctness of exception
handling, you could simply drop all this restore business and let the unwind
info be incorrect for the 1-2 insns before the return.
r~