This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] aarch64: optimize the unaligned case of memcmp

From: Sebastian Pop <sebpop at gmail dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
Cc: Sebastian Pop <s dot pop at samsung dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, "maxim dot kuvyrkov at linaro dot org" <maxim dot kuvyrkov at linaro dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, "ryan dot arnold at linaro dot org" <ryan dot arnold at linaro dot org>, "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>, nd <nd at arm dot com>
Date: Mon, 26 Jun 2017 14:47:02 -0500
Subject: Re: [PATCH] aarch64: optimize the unaligned case of memcmp
Authentication-results: sourceware.org; auth=none
References: <CGME20170622233226uscas1p213aefedba5fe47e520aac1226a731162@uscas1p2.samsung.com> <1498174226-16525-1-git-send-email-s.pop@samsung.com> <637cf51c-160d-172f-6520-bba51058f85e@samsung.com> <AM5PR0802MB26106339AAEF3DABB5ACE56F83D80@AM5PR0802MB2610.eurprd08.prod.outlook.com> <19ed586c-9724-cdc4-177f-174f880864a4@samsung.com> <AM5PR0802MB2610E38DEE75A9457B824C7C83DF0@AM5PR0802MB2610.eurprd08.prod.outlook.com>

On Mon, Jun 26, 2017 at 2:00 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Sebastian Pop wrote:
>> On 06/23/2017 04:28 PM, Wilco Dijkstra wrote:
>>
>> > Where is the setup of limit_wd and limit???
>>
>> You are right, my patch was not quite correct: I was missing the
>> initialization of limit_wd, like so:
>>
>> lsr     limit_wd, limit, #3
>>
>> limit is the number of bytes to be compared passed in as a parameter to
>> memcmp.
>
> You're still missing the setting of limit. Your current version will do the
> words up to limit - (limit & 7), and then do byte by byte using the original
> value of limit, so it's going well outside its bounds...

You are right, I was missing the "and     limit, limit, #7". With that added
the performance looks different than the byte-by-byte memcmp.
Still the performance is lower than with aligning src1:

Benchmark                              Time           CPU Iterations
--------------------------------------------------------------------
BM_string_memcmp_unaligned/8        1288 ns       1288 ns     540945
5.92208MB/s
BM_string_memcmp_unaligned/16       1303 ns       1303 ns     537143
11.7123MB/s
BM_string_memcmp_unaligned/20        341 ns        341 ns    2064228
55.9994MB/s
BM_string_memcmp_unaligned/30        405 ns        405 ns    1726750
70.5799MB/s
BM_string_memcmp_unaligned/42        405 ns        405 ns    1728170
98.8833MB/s
BM_string_memcmp_unaligned/55        563 ns        562 ns    1239350
93.2568MB/s
BM_string_memcmp_unaligned/60        539 ns        539 ns    1298194
106.109MB/s
BM_string_memcmp_unaligned/64       2378 ns       2378 ns     359461
25.6695MB/s

And for larger data sets the performance is still lower than when aligning src1:

Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp_unaligned/8          1288 ns       1288 ns     543230
   5.9221MB/s
BM_string_memcmp_unaligned/64         2377 ns       2377 ns     359351
  25.6742MB/s
BM_string_memcmp_unaligned/512        6444 ns       6444 ns     184103
  75.7774MB/s
BM_string_memcmp_unaligned/1024       4869 ns       4868 ns     143785
  200.599MB/s
BM_string_memcmp_unaligned/8k        33090 ns      33089 ns      21279
  236.107MB/s
BM_string_memcmp_unaligned/16k       66748 ns      66738 ns      10436
  234.123MB/s
BM_string_memcmp_unaligned/32k      131781 ns     131775 ns       5106
  237.147MB/s
BM_string_memcmp_unaligned/64k      291907 ns     291860 ns       2334
  214.143MB/s

Follow-Ups:
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra

References:
- [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]