Bug 25933 - Off by one error in __strncmp_avx2 when length=VEC_SIZE*4 and strings are at page boundaries can cause a segfault
Summary: Off by one error in __strncmp_avx2 when length=VEC_SIZE*4 and strings are at ...
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: string (show other bugs)
Version: 2.28
: P2 normal
Target Milestone: 2.32
Assignee: Sunil Pandey
URL: https://sourceware.org/pipermail/libc...
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-06 23:49 UTC by David Mendenhall
Modified: 2020-07-04 17:22 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2020-05-07 00:00:00


Attachments
report (143.19 KB, application/pdf)
2020-05-06 23:49 UTC, David Mendenhall
Details
test case (296 bytes, text/x-csrc)
2020-05-07 02:23 UTC, David Mendenhall
Details
strncmp_avx2 patch for pr25933 (346 bytes, patch)
2020-06-07 18:15 UTC, Sunil Pandey
Details | Diff
strncmp_avx2 patch.1 for pr25933 (1.18 KB, patch)
2020-06-10 08:21 UTC, Sunil Pandey
Details | Diff
strcmp_avx2 micro benchmark comparison. (1.82 KB, text/plain)
2020-06-11 02:03 UTC, Sunil Pandey
Details
strncmp_avx2 micro benchmark comparison. (2.58 KB, text/plain)
2020-06-11 02:04 UTC, Sunil Pandey
Details
wcscmp_avx2 micro benchmark comparison. (2.57 KB, text/plain)
2020-06-11 02:06 UTC, Sunil Pandey
Details
wcsncmp_avx2 micro benchmark comparison. (3.04 KB, text/plain)
2020-06-11 02:08 UTC, Sunil Pandey
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Mendenhall 2020-05-06 23:49:56 UTC
Created attachment 12507 [details]
report

When the two strings being compared are at the end of their pages, __strncmp_avx2 will fall back to a one-byte-at-a-time loop named "cross_page_loop". This loop is incorrect if the length of the comparison exactly matches VEC_SIZE*4, which is 128 on my machine.

Full report in attached pdf.
Comment 1 David Mendenhall 2020-05-07 02:23:34 UTC
Created attachment 12508 [details]
test case

I reduced the bug to a stand-alone test case, now attached.
Comment 2 Adhemerval Zanella 2020-05-07 13:07:12 UTC
By extending your testing to check for more alignments and sizes:

  for (size_t s = 99; s <= 4 * VEC_SIZE; s++)
    for (size_t s1a = 31; s1a < 32; s1a++)
      for (size_t s2a = 30; s2a < 32; s2a++)
        { 
          ret = strncmp (s1 + PAGE_SIZE - s - s1a,
                         s1 + PAGE_SIZE - s - s2a,
                         s);
          assert (ret == 0);
        }

It seems that another page cross also requires fixing:

580         xorl    %r8d, %r8d
581         /* If ECX > VEC_SIZE * 2, skip ECX - (VEC_SIZE * 2) bytes.  */
582         subl    $(VEC_SIZE * 2), %ecx
583         jle     1f
584         /* Skip ECX bytes.  */
585         shrq    %cl, %rdi
586         /* R8 has number of bytes skipped.  */
587         movl    %ecx, %r8d
588 1:
589         /* Before jumping back to the loop, set ESI to the number of
590            VEC_SIZE * 4 blocks before page crossing.  */
591         movl    $(PAGE_SIZE / (VEC_SIZE * 4) - 1), %esi
592 
593         testq   %rdi, %rdi
594         je      L(back_to_loop)
595         tzcntq  %rdi, %rcx
596         addq    %r10, %rcx
597         /* Adjust for number of bytes skipped.  */

It should not jump back to loop if the ecx is negative (as some cases).
Comment 3 H.J. Lu 2020-05-19 17:39:34 UTC
There is a test case at

https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr25933/master
Comment 4 Sunil Pandey 2020-06-07 18:15:30 UTC
Created attachment 12601 [details]
strncmp_avx2 patch for pr25933

Tested attached patch on

https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr25933/master
Comment 5 H.J. Lu 2020-06-07 18:30:45 UTC
(In reply to Sunil Pandey from comment #4)
> Created attachment 12601 [details]
> strncmp_avx2 patch for pr25933
> 
> Tested attached patch on
> 
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr25933/master

Looks good.  Please try this

diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S b/sysdeps/x86_64/multiarch/strcmp-avx2.S
index 48d03a9f46..dabc3e7590 100644
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
@@ -256,6 +256,11 @@ L(next_3_vectors):
   vpmovmskb %ymm0, %ecx
   testl %ecx, %ecx
   jne   L(return_3_vec_size)
+# ifdef USE_AS_STRNCMP
+  /* Check if VEC_SIZE * 4 already exceeded max compare count %r11 */
+  cmpq  $(VEC_SIZE * 4), %r11
+  jbe   L(zero)
+# endif
 L(main_loop_header):
   leaq  (VEC_SIZE * 4)(%rdi), %rdx
   movl  $PAGE_SIZE, %ecx
Comment 6 Sunil Pandey 2020-06-07 20:03:36 UTC
(In reply to H.J. Lu from comment #5)
> (In reply to Sunil Pandey from comment #4)
> > Created attachment 12601 [details]
> > strncmp_avx2 patch for pr25933
> > 
> > Tested attached patch on
> > 
> > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr25933/master
> 
> Looks good.  Please try this
> 
> diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S
> b/sysdeps/x86_64/multiarch/strcmp-avx2.S
> index 48d03a9f46..dabc3e7590 100644
> --- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
> +++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
> @@ -256,6 +256,11 @@ L(next_3_vectors):
>    vpmovmskb %ymm0, %ecx
>    testl %ecx, %ecx
>    jne   L(return_3_vec_size)
> +# ifdef USE_AS_STRNCMP
> +  /* Check if VEC_SIZE * 4 already exceeded max compare count %r11 */
> +  cmpq  $(VEC_SIZE * 4), %r11
> +  jbe   L(zero)
> +# endif
>  L(main_loop_header):
>    leaq  (VEC_SIZE * 4)(%rdi), %rdx
>    movl  $PAGE_SIZE, %ecx

It fixes the issue on my setup as expected.

$ ./test-strncmp
                       	simple_strncmp	stupid_strncmp	__strncmp_avx2	__strncmp_sse42	__strncmp_ssse3	__strncmp_sse2
$ echo $?
0

$ git diff
diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S b/sysdeps/x86_64/multiarch/strcmp-avx2.S
index 48d03a9f46..84ffe2cd5c 100644
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
@@ -256,6 +256,11 @@ L(next_3_vectors):
        vpmovmskb %ymm0, %ecx
        testl   %ecx, %ecx
        jne     L(return_3_vec_size)
+# ifdef USE_AS_STRNCMP
+        /* Check if VEC_SIZE * 4 already exceeded max compare count %r11 */
+       cmpq  $(VEC_SIZE * 4), %r11
+       jbe   L(zero)
+# endif
 L(main_loop_header):
        leaq    (VEC_SIZE * 4)(%rdi), %rdx
        movl    $PAGE_SIZE, %ecx
Comment 7 H.J. Lu 2020-06-07 20:43:03 UTC
(In reply to Sunil Pandey from comment #6)
> (In reply to H.J. Lu from comment #5)
> > (In reply to Sunil Pandey from comment #4)
> > > Created attachment 12601 [details]
> > > strncmp_avx2 patch for pr25933
> > > 
> > > Tested attached patch on
> > > 
> > > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr25933/master
> > 
> > Looks good.  Please try this
> > 
> > diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S
> > b/sysdeps/x86_64/multiarch/strcmp-avx2.S
> > index 48d03a9f46..dabc3e7590 100644
> > --- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
> > +++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
> > @@ -256,6 +256,11 @@ L(next_3_vectors):
> >    vpmovmskb %ymm0, %ecx
> >    testl %ecx, %ecx
> >    jne   L(return_3_vec_size)
> > +# ifdef USE_AS_STRNCMP
> > +  /* Check if VEC_SIZE * 4 already exceeded max compare count %r11 */
> > +  cmpq  $(VEC_SIZE * 4), %r11
> > +  jbe   L(zero)
> > +# endif
> >  L(main_loop_header):
> >    leaq  (VEC_SIZE * 4)(%rdi), %rdx
> >    movl  $PAGE_SIZE, %ecx
> 
> It fixes the issue on my setup as expected.
> 
> $ ./test-strncmp
>                        	simple_strncmp	stupid_strncmp	__strncmp_avx2
> __strncmp_sse42	__strncmp_ssse3	__strncmp_sse2
> $ echo $?
> 0
> 

Did you run "make check"?
Comment 8 Sunil Pandey 2020-06-07 21:46:29 UTC
(In reply to H.J. Lu from comment #7)
> (In reply to Sunil Pandey from comment #6)
> > (In reply to H.J. Lu from comment #5)
> > > (In reply to Sunil Pandey from comment #4)
> > > > Created attachment 12601 [details]
> > > > strncmp_avx2 patch for pr25933
> > > > 
> > > > Tested attached patch on
> > > > 
> > > > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr25933/master
> > > 
> > > Looks good.  Please try this
> > > 
> > > diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S
> > > b/sysdeps/x86_64/multiarch/strcmp-avx2.S
> > > index 48d03a9f46..dabc3e7590 100644
> > > --- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
> > > +++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
> > > @@ -256,6 +256,11 @@ L(next_3_vectors):
> > >    vpmovmskb %ymm0, %ecx
> > >    testl %ecx, %ecx
> > >    jne   L(return_3_vec_size)
> > > +# ifdef USE_AS_STRNCMP
> > > +  /* Check if VEC_SIZE * 4 already exceeded max compare count %r11 */
> > > +  cmpq  $(VEC_SIZE * 4), %r11
> > > +  jbe   L(zero)
> > > +# endif
> > >  L(main_loop_header):
> > >    leaq  (VEC_SIZE * 4)(%rdi), %rdx
> > >    movl  $PAGE_SIZE, %ecx
> > 
> > It fixes the issue on my setup as expected.
> > 
> > $ ./test-strncmp
> >                        	simple_strncmp	stupid_strncmp	__strncmp_avx2
> > __strncmp_sse42	__strncmp_ssse3	__strncmp_sse2
> > $ echo $?
> > 0
> > 
> 
> Did you run "make check"?

yes. it fixes strncmp and no new failure.
Comment 9 H.J. Lu 2020-06-07 21:58:40 UTC
(In reply to Sunil Pandey from comment #8)
> 
> yes. it fixes strncmp and no new failure.

I got

FAIL: wcsmbs/test-wcsncmp
Comment 10 H.J. Lu 2020-06-07 22:01:18 UTC
(In reply to H.J. Lu from comment #9)
> (In reply to Sunil Pandey from comment #8)
> > 
> > yes. it fixes strncmp and no new failure.
> 
> I got
> 
> FAIL: wcsmbs/test-wcsncmp

Please rebase users/hjl/pr25933/master branch.
Comment 11 H.J. Lu 2020-06-07 23:05:17 UTC
I think L(loop_cross_page) block is incorrect.  Please compare it against
L(loop_cross_page) block in strcmp-sse2-unaligned.S.
Comment 12 H.J. Lu 2020-06-08 01:26:23 UTC
The bug is around

593         testq   %rdi, %rdi
594         je      L(back_to_loop)

At this point, there may be less than 4 vector length remaining:

Breakpoint 1, __strncmp_avx2 ()
    at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:594
594		je	L(back_to_loop)
(gdb) p $r11
$2 = 97
(gdb) next
303		vmovdqa	(%rax), %ymm0
(gdb) 
304		vmovdqa	VEC_SIZE(%rax), %ymm3
(gdb) 
305		VPCMPEQ	(%rdx), %ymm0, %ymm4
(gdb) 
306		VPCMPEQ	VEC_SIZE(%rdx), %ymm3, %ymm1
(gdb) 
307		VPMINU	%ymm0, %ymm4, %ymm4
(gdb) 
308		VPMINU	%ymm3, %ymm1, %ymm1
(gdb) 
309		vmovdqa	(VEC_SIZE * 2)(%rax), %ymm2
(gdb) 
310		VPMINU	%ymm1, %ymm4, %ymm0
(gdb) 
311		vmovdqa	(VEC_SIZE * 3)(%rax), %ymm3
(gdb) 
312		VPCMPEQ	(VEC_SIZE * 2)(%rdx), %ymm2, %ymm5
(gdb) 
313		VPCMPEQ	(VEC_SIZE * 3)(%rdx), %ymm3, %ymm6
(gdb) 

Program received signal SIGSEGV, Segmentation fault.
__strncmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:313
313		VPCMPEQ	(VEC_SIZE * 3)(%rdx), %ymm3, %ymm6
(gdb)
Comment 13 H.J. Lu 2020-06-09 02:30:03 UTC
There are

L(loop_cross_page_2_vec):
        /* The first VEC_SIZE * 2 bytes match or are ignored.  */
        vmovdqu (VEC_SIZE * 2)(%rax, %r10), %ymm2
        vmovdqu (VEC_SIZE * 3)(%rax, %r10), %ymm3
        VPCMPEQ (VEC_SIZE * 2)(%rdx, %r10), %ymm2, %ymm5
        VPMINU  %ymm2, %ymm5, %ymm5
        VPCMPEQ (VEC_SIZE * 3)(%rdx, %r10), %ymm3, %ymm6
        VPCMPEQ %ymm7, %ymm5, %ymm5
        VPMINU  %ymm3, %ymm6, %ymm6
        VPCMPEQ %ymm7, %ymm6, %ymm6

        vpmovmskb %ymm5, %edi
        vpmovmskb %ymm6, %esi

        salq    $32, %rsi
        xorq    %rsi, %rdi

        xorl    %r8d, %r8d
        /* If ECX > VEC_SIZE * 2, skip ECX - (VEC_SIZE * 2) bytes.  */
        subl    $(VEC_SIZE * 2), %ecx
        jle     1f  
        /* Skip ECX bytes.  */
        shrq    %cl, %rdi
        /* R8 has number of bytes skipped.  */
        movl    %ecx, %r8d
1:
        /* Before jumping back to the loop, set ESI to the number of
           VEC_SIZE * 4 blocks before page crossing.  */
        movl    $(PAGE_SIZE / (VEC_SIZE * 4) - 1), %esi

        testq   %rdi, %rdi
        je      L(back_to_loop)

When this branch is taken, there are (VEC_SIZE * 4) + %r10 matching bytes
starting at %rax, which may be >= the maximum offset.
Comment 14 Sunil Pandey 2020-06-10 08:21:11 UTC
Created attachment 12610 [details]
strncmp_avx2 patch.1 for pr25933

I ran glibc make check and all test pass with this patch.
Comment 15 H.J. Lu 2020-06-10 13:11:29 UTC
(In reply to Sunil Pandey from comment #14)
> Created attachment 12610 [details]
> strncmp_avx2 patch.1 for pr25933
> 
> I ran glibc make check and all test pass with this patch.

You removed loop unrolling.  Please provide all relevant glibc micro
benchmarks data before and after your change.
Comment 16 Sunil Pandey 2020-06-11 02:03:21 UTC
Created attachment 12612 [details]
strcmp_avx2 micro benchmark comparison.

strcmp_avx2 micro benchmark before and after test with RDTSCP and CLOCK time.
Comment 17 Sunil Pandey 2020-06-11 02:04:58 UTC
Created attachment 12613 [details]
strncmp_avx2 micro benchmark comparison.

strncmp_avx2 micro benchmark before and after test with RDTSCP and CLOCK time.
Comment 18 Sunil Pandey 2020-06-11 02:06:59 UTC
Created attachment 12614 [details]
wcscmp_avx2 micro benchmark comparison.

wcscmp_avx2 micro benchmark before and after test with RDTSCP and CLOCK time.
Comment 19 Sunil Pandey 2020-06-11 02:08:02 UTC
Created attachment 12615 [details]
wcsncmp_avx2 micro benchmark comparison.

wcsncmp_avx2 micro benchmark before and after test with RDTSCP and CLOCK time.
Comment 20 Sunil Pandey 2020-06-11 02:11:13 UTC
(In reply to H.J. Lu from comment #15)
> (In reply to Sunil Pandey from comment #14)
> > Created attachment 12610 [details]
> > strncmp_avx2 patch.1 for pr25933
> > 
> > I ran glibc make check and all test pass with this patch.
> 
> You removed loop unrolling.  Please provide all relevant glibc micro
> benchmarks data before and after your change.

I beleive strncmp_avx2 changes affects

strcmp
strncmp
wcscmp
wcsncmp

Let me know if I miss any other relevant micro benchmark corresponding to this change.
Comment 21 H.J. Lu 2020-06-11 16:05:46 UTC
(In reply to Sunil Pandey from comment #20)
> (In reply to H.J. Lu from comment #15)
> > (In reply to Sunil Pandey from comment #14)
> > > Created attachment 12610 [details]
> > > strncmp_avx2 patch.1 for pr25933
> > > 
> > > I ran glibc make check and all test pass with this patch.
> > 
> > You removed loop unrolling.  Please provide all relevant glibc micro
> > benchmarks data before and after your change.
> 
> I beleive strncmp_avx2 changes affects
> 
> strcmp
> strncmp
> wcscmp
> wcsncmp
> 
> Let me know if I miss any other relevant micro benchmark corresponding to
> this change.

I added more bench tests to users/hjl/pr25933/master branch.  Please re-collect
numbers.
Comment 22 H.J. Lu 2020-06-13 11:27:18 UTC
A patch is posted at

https://sourceware.org/pipermail/libc-alpha/2020-June/115012.html
Comment 23 Sourceware Commits 2020-07-04 16:47:33 UTC
The release/2.31/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4e8a33a9590edc5c3a2cc5e726a3f2a73b66cdc0

commit 4e8a33a9590edc5c3a2cc5e726a3f2a73b66cdc0
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Jul 4 09:45:21 2020 -0700

    NEWS: Mention BZ 25933 fix
Comment 24 Sourceware Commits 2020-07-04 16:52:40 UTC
The release/2.30/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=94abcef26ebbe89861128a9a62741e410104a342

commit 94abcef26ebbe89861128a9a62741e410104a342
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Jul 4 09:45:21 2020 -0700

    NEWS: Mention BZ 25933 fix
Comment 25 Sourceware Commits 2020-07-04 17:02:07 UTC
The release/2.29/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=83aaa1714428ba3b29315c8c5d14b1766b2ca3aa

commit 83aaa1714428ba3b29315c8c5d14b1766b2ca3aa
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Jul 4 09:45:21 2020 -0700

    NEWS: Mention BZ 25933 fix
Comment 26 Sourceware Commits 2020-07-04 17:21:43 UTC
The release/2.28/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f82072183ad5b328f6a7cb91868cb1709e85d96c

commit f82072183ad5b328f6a7cb91868cb1709e85d96c
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Jul 4 09:45:21 2020 -0700

    NEWS: Mention BZ 25933 fix
Comment 27 H.J. Lu 2020-07-04 17:22:03 UTC
Fixed for 2.32 and on 2.31/2.30/2.29/2.28 branches.