This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] aarch64: optimize the unaligned case of memcmp
Sebastian Pop wrote:
>
> And for larger data sets the performance is still lower than when aligning src1:
Benchmark Time CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp_unaligned/8 1288 ns 1288 ns 543230
5.9221MB/s
BM_string_memcmp_unaligned/64 2377 ns 2377 ns 359351
25.6742MB/s
BM_string_memcmp_unaligned/512 6444 ns 6444 ns 184103
75.7774MB/s
BM_string_memcmp_unaligned/1024 4869 ns 4868 ns 143785
200.599MB/s
BM_string_memcmp_unaligned/8k 33090 ns 33089 ns 21279
236.107MB/s
BM_string_memcmp_unaligned/16k 66748 ns 66738 ns 10436
234.123MB/s
BM_string_memcmp_unaligned/32k 131781 ns 131775 ns 5106
237.147MB/s
BM_string_memcmp_unaligned/64k 291907 ns 291860 ns 2334
214.143MB/s
These numbers still don't make any sense, the first few results are now many times slower
than the byte-by-byte version (as in your initial mail):
BM_string_memcmp_unaligned/8 339 ns 339 ns 2070998 22.5302MB/s
BM_string_memcmp_unaligned/64 1392 ns 1392 ns 502796 43.8454MB/s
BM_string_memcmp_unaligned/512 9194 ns 9194 ns 76133 53.1104MB/s
BM_string_memcmp_unaligned/1024 18325 ns 18323 ns 38206 53.2963MB/s
BM_string_memcmp_unaligned/8k 148579 ns 148574 ns 4713 52.5831MB/s
BM_string_memcmp_unaligned/16k 298169 ns 298120 ns 2344 52.4118MB/s
BM_string_memcmp_unaligned/32k 598813 ns 598797 ns 1085 52.188MB/s
BM_string_memcmp_unaligned/64k 1196079 ns 1196083 ns 540 52.2539MB/s
Wilco