Bug 23709 - glibc 2.25 lacks sse2 optimized strstr()
Summary: glibc 2.25 lacks sse2 optimized strstr()
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: string (show other bugs)
Version: 2.25
: P2 normal
Target Milestone: 2.29
Assignee: Adhemerval Zanella
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-25 12:14 UTC by paul.borile
Modified: 2018-11-02 10:40 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2018-09-25 00:00:00
fweimer: security-


Attachments
strstr.c (691 bytes, text/x-csrc)
2018-10-11 09:51 UTC, paul.borile
Details
strstr executable (3.05 KB, application/x-sharedlib)
2018-10-11 15:54 UTC, paul.borile
Details

Note You need to log in before you can comment on or make changes to this bug.
Description paul.borile 2018-09-25 12:14:59 UTC
Hi,
looks like code using strstr() and libc-2.27 is running 4 times slower than was with libc-2.23. Having a look at profiled code executed in various moments show different versions of strstr :

- libc-2.27.so: strstr.c, str-two-way.h - this is current version ubuntu 18.04.1

Same code, profiled March 13, 2018 :

- libc-2.23.so: strstr-sse2-unaligned.S - this version was 4 times faster

Was the sse2 optimized code removed from strstr() in 2.27 ? Any plans to reintroduce it ?

-Paul
Comment 1 Florian Weimer 2018-09-25 12:25:47 UTC
glibc 2.27 on Fedora 28 x86_64 selects __strstr_sse2.

What's your architecture, and which CPU do you use?
Comment 2 paul.borile 2018-09-25 12:34:00 UTC
ubuntu 18.04.1 LTS, Intel Core i7-4710HQ
Comment 3 Andreas Schwab 2018-09-25 12:37:10 UTC
How did you compile it?
Comment 4 paul.borile 2018-09-25 12:44:14 UTC
Just using the standard libc installed on ubuntu 18.04
Comment 5 Andreas Schwab 2018-09-25 12:48:54 UTC
You need to report that to ubuntu then.
Comment 6 paul.borile 2018-09-25 12:51:54 UTC
already submitted but no answer. Thanks for clarifying that strstr() _should_ have sse 2 optimizations.
Comment 7 Florian Weimer 2018-09-25 12:56:16 UTC
(In reply to paul.borile from comment #2)
> ubuntu 18.04.1 LTS, Intel Core i7-4710HQ

i386 or amd64?
Comment 8 paul.borile 2018-09-25 13:01:34 UTC
amd64
Comment 9 Florian Weimer 2018-09-25 13:05:07 UTC
Okay, then Ubuntu must have a peculiar build.  You need to report it to them.

As I said, on a Core i7-4810MQ, other builds of glibc 2.27 select an SSE2-optimized version of strstr.
Comment 10 Florian Weimer 2018-10-01 09:35:38 UTC
I looked at the Ubuntu patches, but couldn't find anything obvious.
Comment 11 paul.borile 2018-10-11 09:51:17 UTC
Created attachment 11315 [details]
strstr.c
Comment 12 paul.borile 2018-10-11 09:54:41 UTC
I made a test compiling for amd64 the attached program and running it on different platforms. The test run in 0.5 seconds on a sse2 optimized strstr() while 8 times slower on a non sse2 optimized strstr(). Example of a fast run :

$ docker run -i ubuntu:16.04 /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'ubuntu:16.04' locally
16.04: Pulling from library/ubuntu
3b37166ec614: Pull complete 
504facff238f: Pull complete 
ebbcacd28e10: Pull complete 
c7fb3351ecad: Pull complete 
2e3debadcbf7: Pull complete 
Digest: sha256:45ddfa61744947b0b8f7f20b8de70cbcdd441a6a0532f791fd4c09f5e491a8eb
Status: Downloaded newer image for ubuntu:16.04

real	0m0.475s
user	0m0.474s
sys	0m0.000s
matches = 3000000

Example of a slow run :

docker run -i ubuntu:18.04 /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'ubuntu:18.04' locally
18.04: Pulling from library/ubuntu
Digest: sha256:de774a3145f7ca4f0bd144c7d4ffb2931e06634f11529653b23eba85aef8e378
Status: Downloaded newer image for ubuntu:18.04
matches = 3000000

real	0m4.796s
user	0m4.794s
sys	0m0.000s

debian buster is showing the same problem : 

docker run -i debian:buster /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'debian:buster' locally
buster: Pulling from library/debian
1064a561889d: Pull complete 
Digest: sha256:b17f0ca6c2c0c5ad88b3a292e50ad657c87fc165ef1bde4dba57d9616651a9b9
Status: Downloaded newer image for debian:buster
matches = 3000000

real	0m4.871s
user	0m4.871s
sys	0m0.000s

So it is not an ubuntu 18 only problem.
Comment 13 paul.borile 2018-10-11 09:58:53 UTC
debian buster (libc-2.27) has the same problem, debian 9 (libc-2.24) does not
Comment 14 paul.borile 2018-10-11 10:08:48 UTC
fedora 27 and 26 are showing the same problema, fedora 25 is ok :


$ docker run -i fedora:27 /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'fedora:27' locally
27: Pulling from library/fedora
c8ae8b35783e: Pull complete 
Digest: sha256:f4fb457f7cd858e0f6efa1a78390622823e83c393981b0bf91a2b6e8da3b6f50
Status: Downloaded newer image for fedora:27
matches = 3000000

real	0m4.939s
user	0m4.937s
sys	0m0.000s

$ docker run -i fedora:26 /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'fedora:26' locally
26: Pulling from library/fedora
df75b105d67a: Pull complete 
Digest: sha256:b05007bacf64401663ab5d9a97d73b143a561be74a36a0f31219e5acb87fc7ec
Status: Downloaded newer image for fedora:26
matches = 3000000

real	0m4.426s
user	0m4.426s
sys	0m0.000s

$ docker run -i fedora:25 /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'fedora:25' locally
25: Pulling from library/fedora
8ab017634520: Pull complete 
Digest: sha256:322cb01bbca26972c98051bacd3ab8555cec059496d64d35ee78b15de9ea0d06
Status: Downloaded newer image for fedora:25
matches = 3000000

real	0m0.468s
user	0m0.468s
sys	0m0.000s
Comment 15 Andreas Schwab 2018-10-11 10:16:54 UTC
Cannot reproduce on openSUSE Tumbleweed.
Comment 16 paul.borile 2018-10-11 10:24:24 UTC
opensuse:42.3 (libc-2.22) does not have the problem. Seems that it was introduced in libc-2.25 (fedora26 has it).


$ docker run -i opensuse:42.3 /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'opensuse:42.3' locally
42.3: Pulling from library/opensuse
47aa660240a8: Pull complete 
Digest: sha256:569e6ee7a622838b9fa1111c3bfa99a50fdb34b7503f945b7d18ce66bb94a369
Status: Downloaded newer image for opensuse:42.3
matches = 3000000

real	0m0.450s
user	0m0.450s
sys	0m0.000s
Comment 17 paul.borile 2018-10-11 10:25:39 UTC
opensuse:42.3 (libc-2.22) does not have the problem. Seems that it was introduced in libc-2.25 (fedora26 has it).


$ docker run -i opensuse:42.3 /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'opensuse:42.3' locally
42.3: Pulling from library/opensuse
47aa660240a8: Pull complete 
Digest: sha256:569e6ee7a622838b9fa1111c3bfa99a50fdb34b7503f945b7d18ce66bb94a369
Status: Downloaded newer image for opensuse:42.3
matches = 3000000

real	0m0.450s
user	0m0.450s
sys	0m0.000s
Comment 18 paul.borile 2018-10-11 10:28:59 UTC
(In reply to Florian Weimer from comment #1)
> glibc 2.27 on Fedora 28 x86_64 selects __strstr_sse2.
> 
> What's your architecture, and which CPU do you use?

fedora28 seems to have the problem :

$ docker run -i fedora:28 /bin/bash -c "cat > strstr ; chmod +x strstr ; time ./strstr aa" < ./strstr
Unable to find image 'fedora:28' locally
28: Pulling from library/fedora
565884f490d9: Pull complete 
Digest: sha256:b41cd083421dd7aa46d619e958b75a026a5d5733f08f14ba6d53943d6106ea6d
Status: Downloaded newer image for fedora:28
matches = 3000000

real	0m4.510s
user	0m4.510s
sys	0m0.000s
Comment 19 paul.borile 2018-10-11 15:54:39 UTC
Created attachment 11318 [details]
strstr executable
Comment 20 Adhemerval Zanella 2018-10-11 17:34:48 UTC
The issue was introduced by commit 'Disable TSX on some Haswell processors.' (2702856bf45c82cf8e69f2064f5aa15c0ceb6359) which change the some default flags for unknown models.

Previously, new models were handled by the default switch option which assumed a Core i3/i5/i7 if AVX is available. After the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) does not set the flags Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and Prefer_PMINUB_for_stringop for newer processors.

The most straightforward fix is to not mix the memory access flags set with TSX, rather handle it on its own switch:

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 122372862c..ecc82fc6af 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -297,7 +297,13 @@ init_cpu_features (struct cpu_features *cpu_features)
                    | bit_arch_Fast_Unaligned_Copy
                    | bit_arch_Prefer_PMINUB_for_stringop);
              break;
+           }
 
+        /* Disable TSX on some Haswell processors to avoid TSX on kernels that
+           weren't updated with the latest microcode package (which disables
+           broken feature by default).  */
+        switch (model)
+           {
            case 0x3f:
              /* Xeon E7 v3 with stepping >= 4 has working TSX.  */
              if (stepping >= 4)
azanella@birita:~/Projects/glibc/glibc-git-work$
Comment 21 paul.borile 2018-10-12 06:55:39 UTC
So if cpu has the AVX set we loose all the SSE2 optimizations but we keep the avx ones, correct ?
Comment 22 Adhemerval Zanella 2018-10-12 13:02:49 UTC
(In reply to paul.borile from comment #21)
> So if cpu has the AVX set we loose all the SSE2 optimizations but we keep
> the avx ones, correct ?

Not really, only Haswell chips in fact will have this behavior [1]:

Haswell (Client)	GT3E		0	0x6	0x4	0x6	Family 6 Model 70
			ULT		0	0x6	0x4	0x5	Family 6 Model 69
			S		0	0x6	0x3	0xC	Family 6 Model 60

Haswell (Server)	E, EP, EX	0	0x6	0x3	0xF	Family 6 Model 63

On these chips the internal glibc flags bit_arch_Fast_Rep_String, bit_arch_Fast_Unaligned_Load, bit_arch_Fast_Unaligned_Copy, and bit_arch_Prefer_PMINUB_for_stringop won't be set:

The bit_arch_Fast_Rep_String is only used on ifunc selection on i686 (32-bits) and it selects the *ssse3_rep* memcpy, memmove, bcopy, and mempcpy.  It it is not set the *ssse3* variant is used instead (not sure which is the performance difference between them).

The bit_arch_Fast_Unaligned_Load influences both i686 and x86_64. For x86_64 it influences the selections of the SSE2 unaligned optimization variants for stpncpy, strcpy, strncpy, stpcpy, strncat, strcat, and strstr.  For all but strstr an ssse3 or sse2 variant is used instead (not sure either which is the performance difference between them).

The bit_arch_Fast_Unaligned_Copy influences mempcpy, memmove, and memcpy. If chip has not SSE3 the bit will select either a RMS or a unaligned variant. For Haswell the *_avx_unaligned_erms variants will be selected, so this bits won't interfere with best selections.

The bit_arch_PMINUB_for_stringop is not used on ifunc selection.

[1] https://en.wikichip.org/wiki/intel/cpuid
Comment 23 Sourceware Commits 2018-10-23 18:28:52 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  18ad0de6513bf8a8e4ba757c069e6806d07920f8 (commit)
       via  c3d8dc45c9df199b8334599a6cbd98c9950dba62 (commit)
      from  f1034472e21d77b978464b73adbb0f9f1f032c91 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=18ad0de6513bf8a8e4ba757c069e6806d07920f8

commit 18ad0de6513bf8a8e4ba757c069e6806d07920f8
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Oct 23 14:53:12 2018 -0300

    Fix tst-preadvwritev2 build failure on HURD
    
    Commit 7a16bdbb9ff41 uses IOV_MAX, which is not defined on hurd.
    
    Checked on a build for i686-gnu.
    
    	* misc/tst-preadvwritev2-common.c (IOV_MAX): Define if not
    	defined.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c3d8dc45c9df199b8334599a6cbd98c9950dba62

commit c3d8dc45c9df199b8334599a6cbd98c9950dba62
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Oct 11 15:18:40 2018 -0300

    x86: Fix Haswell strong flags (BZ#23709)
    
    Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
    default flags for Haswell models.  Previously, new models were handled by the
    default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
    the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
    Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
    Prefer_PMINUB_for_stringop (only the TSX one).
    
    This patch fixes it by disentangle the TSX flag handling from the memory
    optimization ones.  The strstr case cited on patch now selects the
    __strstr_sse2_unaligned as expected for the Haswell cpu.
    
    Checked on x86_64-linux-gnu.
    
    	[BZ #23709]
    	* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
    	independently of other flags.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                       |    9 +++++++++
 misc/tst-preadvwritev2-common.c |    5 +++++
 sysdeps/x86/cpu-features.c      |    6 ++++++
 3 files changed, 20 insertions(+), 0 deletions(-)
Comment 24 Adhemerval Zanella 2018-10-23 18:29:17 UTC
Fixed on 2.29 (c3d8dc45c9df199b8334599a6cbd98c9950dba62).
Comment 25 Sourceware Commits 2018-11-02 10:13:25 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.28/master has been updated
       via  65010329f2c596bdd1204c1c9c9baac0193637af (commit)
      from  e1af1df694603c0dcd5118c30eea2cdeb01a1a0b (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=65010329f2c596bdd1204c1c9c9baac0193637af

commit 65010329f2c596bdd1204c1c9c9baac0193637af
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Fri Nov 2 10:56:00 2018 +0100

    x86: Fix Haswell CPU string flags (BZ#23709)
    
    Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
    default flags for Haswell models.  Previously, new models were handled by the
    default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
    the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
    Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
    Prefer_PMINUB_for_stringop (only the TSX one).
    
    This patch fixes it by disentangle the TSX flag handling from the memory
    optimization ones.  The strstr case cited on patch now selects the
    __strstr_sse2_unaligned as expected for the Haswell cpu.
    
    Checked on x86_64-linux-gnu.
    
    	[BZ #23709]
    	* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
    	independently of other flags.
    
    (cherry picked from commit c3d8dc45c9df199b8334599a6cbd98c9950dba62)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                  |    6 ++++++
 NEWS                       |    1 +
 sysdeps/x86/cpu-features.c |    6 ++++++
 3 files changed, 13 insertions(+), 0 deletions(-)
Comment 26 Sourceware Commits 2018-11-02 10:31:54 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.27/master has been updated
       via  d8eee5ef553350364eff1dce9d143fb845c60615 (commit)
      from  5cd5309d910b0fcdeb5d7e734afb535d0231ba2c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d8eee5ef553350364eff1dce9d143fb845c60615

commit d8eee5ef553350364eff1dce9d143fb845c60615
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Fri Nov 2 11:14:57 2018 +0100

    x86: Fix Haswell CPU string flags (BZ#23709)
    
    Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
    default flags for Haswell models.  Previously, new models were handled by the
    default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
    the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
    Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
    Prefer_PMINUB_for_stringop (only the TSX one).
    
    This patch fixes it by disentangle the TSX flag handling from the memory
    optimization ones.  The strstr case cited on patch now selects the
    __strstr_sse2_unaligned as expected for the Haswell cpu.
    
    Checked on x86_64-linux-gnu.
    
    	[BZ #23709]
    	* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
    	independently of other flags.
    
    (cherry picked from commit c3d8dc45c9df199b8334599a6cbd98c9950dba62)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                  |    6 ++++++
 NEWS                       |    1 +
 sysdeps/x86/cpu-features.c |    6 ++++++
 3 files changed, 13 insertions(+), 0 deletions(-)
Comment 27 Sourceware Commits 2018-11-02 10:40:32 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.26/master has been updated
       via  dc40423dba7208ded2ec293c9a2938269f944ee8 (commit)
      from  33f5de7a79b27b9dce30a46d6681974653a85004 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=dc40423dba7208ded2ec293c9a2938269f944ee8

commit dc40423dba7208ded2ec293c9a2938269f944ee8
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Fri Nov 2 11:14:05 2018 +0100

    x86: Fix Haswell CPU string flags (BZ#23709)
    
    Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
    default flags for Haswell models.  Previously, new models were handled by the
    default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
    the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
    Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
    Prefer_PMINUB_for_stringop (only the TSX one).
    
    This patch fixes it by disentangle the TSX flag handling from the memory
    optimization ones.  The strstr case cited on patch now selects the
    __strstr_sse2_unaligned as expected for the Haswell cpu.
    
    Checked on x86_64-linux-gnu.
    
    	[BZ #23709]
    	* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
    	independently of other flags.
    
    (cherry picked from commit c3d8dc45c9df199b8334599a6cbd98c9950dba62)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                  |    6 ++++++
 NEWS                       |    1 +
 sysdeps/x86/cpu-features.c |    6 ++++++
 3 files changed, 13 insertions(+), 0 deletions(-)