This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] aarch64: optimize the unaligned case of memcmp


On Mon, Jun 26, 2017 at 2:00 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Sebastian Pop wrote:
>> On 06/23/2017 04:28 PM, Wilco Dijkstra wrote:
>>
>> > Where is the setup of limit_wd and limit???
>>
>> You are right, my patch was not quite correct: I was missing the
>> initialization of limit_wd, like so:
>>
>> lsr     limit_wd, limit, #3
>>
>> limit is the number of bytes to be compared passed in as a parameter to
>> memcmp.
>
> You're still missing the setting of limit. Your current version will do the
> words up to limit - (limit & 7), and then do byte by byte using the original
> value of limit, so it's going well outside its bounds...

You are right, I was missing the "and     limit, limit, #7". With that added
the performance looks different than the byte-by-byte memcmp.
Still the performance is lower than with aligning src1:

Benchmark                              Time           CPU Iterations
--------------------------------------------------------------------
BM_string_memcmp_unaligned/8        1288 ns       1288 ns     540945
5.92208MB/s
BM_string_memcmp_unaligned/16       1303 ns       1303 ns     537143
11.7123MB/s
BM_string_memcmp_unaligned/20        341 ns        341 ns    2064228
55.9994MB/s
BM_string_memcmp_unaligned/30        405 ns        405 ns    1726750
70.5799MB/s
BM_string_memcmp_unaligned/42        405 ns        405 ns    1728170
98.8833MB/s
BM_string_memcmp_unaligned/55        563 ns        562 ns    1239350
93.2568MB/s
BM_string_memcmp_unaligned/60        539 ns        539 ns    1298194
106.109MB/s
BM_string_memcmp_unaligned/64       2378 ns       2378 ns     359461
25.6695MB/s

And for larger data sets the performance is still lower than when aligning src1:

Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp_unaligned/8          1288 ns       1288 ns     543230
   5.9221MB/s
BM_string_memcmp_unaligned/64         2377 ns       2377 ns     359351
  25.6742MB/s
BM_string_memcmp_unaligned/512        6444 ns       6444 ns     184103
  75.7774MB/s
BM_string_memcmp_unaligned/1024       4869 ns       4868 ns     143785
  200.599MB/s
BM_string_memcmp_unaligned/8k        33090 ns      33089 ns      21279
  236.107MB/s
BM_string_memcmp_unaligned/16k       66748 ns      66738 ns      10436
  234.123MB/s
BM_string_memcmp_unaligned/32k      131781 ns     131775 ns       5106
  237.147MB/s
BM_string_memcmp_unaligned/64k      291907 ns     291860 ns       2334
  214.143MB/s


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]