This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] aarch64: optimize the unaligned case of memcmp

Sebastian Pop wrote:

> If I remove all the alignment code, I get less performance on the hikey 
> A53 board.
> With this patch:

@@ -142,9 +143,23 @@ ENTRY(memcmp)

         .p2align 6
+       cmp     limit, #8
+       b.lo    .LmisalignedLt8
+       .p2align 5
+       ldr     data1, [src1], #8
+       ldr     data2, [src2], #8
+       subs    limit_wd, limit_wd, #1
+       eor     diff, data1, data2      /* Non-zero if differences found. */
+       cbnz    diff, .Lnot_limit
+    .Lloop_part_aligned
         sub     limit, limit, #1
-       /* Perhaps we can do better than this.  */
         ldrb    data1w, [src1], #1
         ldrb    data2w, [src2], #1
         subs    limit, limit, #1

Where is the setup of limit_wd and limit???

I would expect the small cases to be faster since you avoid around 10 cycles of mostly
ALU ops that make very little progress. So it should take several iterations with an extra
unaligned access to before you're worse off. In memcpy (which is similar with 2 streams)
I align after 96 bytes.

> With the extra patch:

--- a/libc/arch-arm64/generic/bionic/memcmp.S
+++ b/libc/arch-arm64/generic/bionic/memcmp.S
@@ -159,7 +159,7 @@ ENTRY(memcmp)
         /* Sources are not aligned align one of the sources find max offset
            from aligned boundary. */

-       and     tmp1, src1, #0x7
+       and     tmp1, src2, #0x7
         orr     tmp3, xzr, #0x8
         sub     pos, tmp3, tmp1

Note it's more readable to write mov tmp3, 8. However it's even better to use a 
writeback of 8 in the unaligned loads, and then subtract tmp1 from src1 and src2 -
this saves 2 instructions.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]