This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] ARM: Add optimized ARMv7 strcmp implementation
- From: Will Newton <will dot newton at linaro dot org>
- To: Richard Henderson <rth at twiddle dot net>
- Cc: libc-alpha <libc-alpha at sourceware dot org>, Richard Earnshaw <rearnsha at arm dot com>
- Date: Fri, 25 Apr 2014 11:15:46 +0100
- Subject: Re: [PATCH] ARM: Add optimized ARMv7 strcmp implementation
- Authentication-results: sourceware.org; auth=none
- References: <1398256338-24784-1-git-send-email-will dot newton at linaro dot org> <5357E6A4 dot 1070109 at twiddle dot net>
On 23 April 2014 17:13, Richard Henderson <rth@twiddle.net> wrote:
Thanks for the review, particularly the attention to CFI which I
always find is easy to get wrong.
> On 04/23/2014 05:32 AM, Will Newton wrote:
>> + clz tmp1, \synd
>> + lsl r1, \d2, tmp1
>> + .if \restore_r6
>> + ldrd r6, r7, [sp, #8]
>> + .endif
>> + cfi_restore (r6)
>> + cfi_restore (r7)
>> + lsl \d1, \d1, tmp1
>> + cfi_remember_state
>> + lsr result, \d1, #24
>> + ldrd r4, r5, [sp], #16
>> + cfi_restore (r4)
>> + cfi_restore (r5)
>> + sub result, result, r1, lsr #24
>
> You can minimize the size of the unwind info (avoiding DW_CFA_advance opcodes)
> by delaying restores until the previous storage may be clobbered, or until some
> other adjustment must be made. The storage is clobbered, obviously, by the
> post-increment of SP deallocating the storage.
>
>> +#else
>> + /* To use the big-endian trick we'd have to reverse all three words.
>> + that's slower than this approach. */
>> + rev \synd, \synd
>> + clz tmp1, \synd
>> + bic tmp1, tmp1, #7
>> + lsr r1, \d2, tmp1
>> + cfi_remember_state
>> + .if \restore_r6
>> + ldrd r6, r7, [sp, #8]
>> + .endif
>> + cfi_restore (r6)
>> + cfi_restore (r7)
>
> Also, it would appear that you're remembering two different states between the
> big and little endian versions of this code. The LE version remembers before
> restoring r6/r7; the BE version remembers afterward.
>
> I suspect you want
>
> ldrd r4, r5, [sp], #16
> cfi_remember_state
> cfi_def_cfa_offset(0)
> cfi_restore (r4)
> cfi_restore (r5)
> cfi_restore (r6)
> cfi_restore (r7)
>
> which has a single advance within this code sequence, and r6/r7 remembered as
> saved, and actually describes the stack adjustment.
Ok, I have attempted to fix this in all cases.
>> + strd r4, r5, [sp, #-16]!
>> + cfi_def_cfa_offset (16)
>> + cfi_offset (r4, -16)
>> + cfi_offset (r5, -12)
>> + orr tmp1, src1, src2
>> + strd r6, r7, [sp, #8]
>> + cfi_offset (r6, -8)
>> + cfi_offset (r7, -4)
>
> FWIW, I find using cfi_rel_offset much easier to reason with, since you get to
> match up numbers in the cfi data with numbers in the assembly. E.g.
>
> cfi_rel_offset(r6, 8)
> cfi_rel_offset(r7, 12)
>
> but perhaps not that helpful in this case.
Yes, I tend to find that simpler too but the code came to me using the
absolute offsets and as you suggest in this case there's not much in
it.
>> +.Lmisaligned_exit:
>> + cfi_remember_state
>> + mov result, tmp1
>> + ldr r4, [sp], #16
>> + cfi_restore (r4)
>> + bx lr
>
> Even though only r4 was modified and reloaded, r5-r7 are still described as
> saved. After popping all of the storage, that's not correct -- you need to
> restore all 4 registers. Again, you're not describing the stack adjustment.
> Again, no point in doing the remember 2 insns before the restores.
>
> Alternately, since this unwind info isn't for correctness of exception
> handling, you could simply drop all this restore business and let the unwind
> info be incorrect for the 1-2 insns before the return.
Hopefully the CFI is all correct now, I will post a v2.
--
Will Newton
Toolchain Working Group, Linaro