<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 29, 2024 at 11:34 AM Noah Goldstein <<a href="mailto:goldstein.w.n@gmail.com">goldstein.w.n@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, May 29, 2024 at 1:26 PM Sunil Pandey <<a href="mailto:skpgkp2@gmail.com" target="_blank">skpgkp2@gmail.com</a>> wrote:<br>

><br>

><br>

><br>

> On Wed, May 29, 2024 at 11:06 AM Noah Goldstein <<a href="mailto:goldstein.w.n@gmail.com" target="_blank">goldstein.w.n@gmail.com</a>> wrote:<br>

>><br>

>> On Wed, May 29, 2024 at 11:37 AM Sunil K Pandey <<a href="mailto:skpgkp2@gmail.com" target="_blank">skpgkp2@gmail.com</a>> wrote:<br>

>> ><br>

>> > This patch align memmove unaligned routines to 64 byte.  Default 16 byte<br>

>> > alignment may cause upto 15% random perf regression for less than vector<br>

>> > size memmove.<br>

>> > ---<br>

>> >  sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S | 2 +-<br>

>> >  1 file changed, 1 insertion(+), 1 deletion(-)<br>

>> ><br>

>> > diff --git a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S<br>

>> > index 838f8f8bff..85c0efd9e3 100644<br>

>> > --- a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S<br>

>> > +++ b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S<br>

>> > @@ -207,7 +207,7 @@ ENTRY (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))<br>

>> >  END (MEMMOVE_CHK_SYMBOL (__memmove_chk, unaligned))<br>

>> >  #endif<br>

>> ><br>

>> > -ENTRY (MEMMOVE_SYMBOL (__memmove, unaligned))<br>

>> > +ENTRY_P2ALIGN (MEMMOVE_SYMBOL (__memmove, unaligned), 6)<br>

>> >         movq    %rdi, %rax<br>

>> >  L(start):<br>

>> >  # ifdef __ILP32__<br>

>> > --<br>

>> > 2.44.0<br>

>> ><br>

>><br>

>> Isn't the erms nearly always used?<br>

><br>

><br>

> It's used in libMicro benchmark.<br>

> <a href="https://github.com/redhat-performance/libMicro" rel="noreferrer" target="_blank">https://github.com/redhat-performance/libMicro</a><br>

<br>

Im not contending that if you microbenchmark this it won't be faster<br>

if it's 64 byte aligned.<br>

<br>

The question is outside of microbenchmarks is this impl important<br>

enough to optimize?<br></blockquote><div><br></div><div>There are people who rely on this benchmark data for their hardware/software</div><div>decision.</div><div><br></div><div>So, if this implementation is there and we can easily optimize, we should.</div><div>If nothing else, it can reduce their random regression troubleshooting time.</div><div><br></div></div></div>