This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Faster strlen

From: Dmitrieva Liubov <liubov dot dmitrieva at gmail dot com>
To: libc-alpha at sourceware dot org
Date: Tue, 9 Oct 2012 19:13:06 +0400
Subject: Re: [PATCH] Faster strlen
References: <20121007172752.GA22344@domone.kolej.mff.cuni.cz><CAMe9rOoSM3KsyU9zih644WN642+ZzRjDtna3dxz1yFcsn-mOOQ@mail.gmail.com>

+  pmovmskb %xmm3, %edx
+  sub %rdi, %rax
+        movq    %rdx, %rcx
+        negq    %rcx
+        andq    %rdx, %rcx

Please, use <tab>instruction<tab> format instead of different styles
on different lines.

And I suggest to use L macro for new labels to improve readability and
to satisfy to the style of other assembler files in glibc.

+  add $16, %rax
+  .p2align 4
+  .align64_loop:

L(align64_loop):

--
Liubov Dmitrieva

2012/10/9 H.J. Lu <hjl.tools@gmail.com>:
> On Sun, Oct 7, 2012 at 10:27 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> Hello, I investigated strlen bit more and improved pminub variant.
>>
>> I got upto 10% speedup by unrolling main loop. I did not measured
>> difference when I unrolled loop more.
>>
>> I also benchmarked atom and added variant which is identical to
>> strlen-sse2-pminub except bsf is replaced by table lookup.
>>
>> Last addition is attempt to generate VEX encoded strlen. I need only to
>> pass -mavx flag when compiling strlen_avx.S but do not know how.
>>
>> Benchmarks are at usual place. To fit all functions consider only random
>> alignment. I also increased granularity of sampling.
>>
>> http://kam.mff.cuni.cz/~ondra/benchmark_string/
>>
>> Results for this patch are
>> http://kam.mff.cuni.cz/~ondra/benchmark_string/benchmark_strlen_7_10_2012.tar.bz2
>>
>> On sandy bridge
>> http://kam.mff.cuni.cz/~ondra/benchmark_string/i7_sandy_bridge/strlen/html/test_r.html
>> there is phase change around sizes 1500-2000. Do you know what caused it?
>>
>> Other optimalization is prefetching. Most of time prefetching variant is
>> slower than nonprefetching(as large strings are rare.)
>> On sandy bridge prefetching is free. I need additional flag to ifunc to
>> indicate that.
>>
>> I disabled prefetching in my patch.
>>
>> On atom ironicaly strlen-sse2-no-bsf was slower than pminub variant
>> except for string less than 16 bytes long.
>>
>> For exit from main loop of no-bsf variant using bsfq instead binary
>> search saves 10 cycles. Multiplication+table lookup is also slow in atom
>> because 64bit multiplication is slow.
>>
>> I used pminub variant with  bsf instruction replaced by my table lookup. This
>> is by about 8 cycles faster on atom.
>>
>> I did not reschedule instructions for atom for easier review.
>>
>> sse2, pminub, no-bsf, sse4 variants are everywhere slower than my patch so I
>> remove them. pminub and no-bsf are used in strcat and will be removed in
>> separate patch.
>>
>> 2012-10-07  Ondrej Bilka  <neleai@seznam.cz>
>>         * sysdeps/x86_64/strlen.S:
>>           Use unrolled pminub variant by default.
>>         * sysdeps/x86_64/multiarch/strlen_avx.S:
>>           Recode default variant using VEX prefix.
>>         * sysdeps/x86_64/multiarch/strlen_atom.S:
>>           New variant tailored to atom.
>>         * sysdeps/x86_64/strlen.S: Updated function selection.
>>         * sysdeps/x86_64/multiarch/strlen-sse4.S: deleted
>>         * sysdeps/x86_64/multiarch/Makefile: updated
>>
>
> Please rename strlen_atom.S to strlen-no-bsf.S since it
> depends on bit_Slow_BSF, not Atom.
>
> Thanks.
>
> --
> H.J.

References:
- [PATCH] Faster strlen
  - From: OndÅej BÃlka
- Re: [PATCH] Faster strlen
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]