Bug 21182 - __memchr_sse2: regression in glibc-2.25 on i686
Summary: __memchr_sse2: regression in glibc-2.25 on i686
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.25
: P2 normal
Target Milestone: 2.26
Assignee: Adhemerval Zanella
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-18 10:42 UTC by blog
Modified: 2017-05-01 13:56 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2017-02-27 00:00:00
fweimer: security-


Attachments
Test case (270 bytes, text/x-c)
2017-02-22 21:18 UTC, Leah Neukirchen
Details
Test output on affected system (90 bytes, text/plain)
2017-02-22 21:19 UTC, Leah Neukirchen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description blog 2017-02-18 10:42:52 UTC
After Void Linux (http://www.voidlinux.eu/) updated their glibc package to 2.25, several users reported segmentation faults on the i686 platform. I experienced them myself on a physical computer using an Intel Atom processor (cpuinfo: https://owncloud.helmut-pozimski.de/index.php/s/3cOwICY2qNemETN ). Specifically I saw segfaults while using ps, grep (on bigger files) or some python relying software like salt-minion, in some instances gdb or kodi.

The segfaults seem to trace back to __memchr_sse2, this is a backtrace produced via gdb while running grep on a 32M file:

https://owncloud.helmut-pozimski.de/index.php/s/03jB43CoEQtEXay

I tried to gather some more pieces of information together with one of the Void developers, but was not able to produce a more detailed backtrace. The last change in this function happened in git commit https://sourceware.org/git/?p=glibc.git;a=commit;h=23d27709a423aec32821e9a5198a10267107bae2 , I tried to revert that one and recompile glibc. This seems to resolve the issue at least one my machine.

It might also be relevant that no one managed to reproduce the issue inside a VM yet, it seemingly is only reproducable on physical hardware and I don't know if it affects all i686 era systems or only a certain range of Intel processors.

To reproduce the issue, run grep with any pattern on a larger file (in my tests 32K were not enough, several 100K produced a segfault and my /var/log/messages which is 32M also did). Also just running "ps aux" on my system produced a segfault as well as just starting either kodi or salt-minion. With gdb I do not know under which circumstances it occurs, it wasn't really reliably reproducable, just happened sometimes.
Comment 1 Florian Weimer 2017-02-20 11:34:19 UTC
(In reply to blog from comment #0)

> I tried to gather some more pieces of information together with one of the
> Void developers, but was not able to produce a more detailed backtrace. The
> last change in this function happened in git commit
> https://sourceware.org/git/?p=glibc.git;a=commit;
> h=23d27709a423aec32821e9a5198a10267107bae2 , I tried to revert that one and
> recompile glibc. This seems to resolve the issue at least one my machine.

I wonder if this is a known CPU erratum.  The new code sequence does not look problematic to me; I don't think it invokes implementation-defined instruction behavior.
Comment 2 Leah Neukirchen 2017-02-22 21:18:47 UTC
Created attachment 9847 [details]
Test case
Comment 3 Leah Neukirchen 2017-02-22 21:19:22 UTC
Created attachment 9848 [details]
Test output on affected system
Comment 4 Leah Neukirchen 2017-02-22 21:30:45 UTC
I could reproduce this on Linux 4.4.6 with glibc 2.25 and gcc 6.3.0 directly running on i686 "Intel(R) Atom(TM) CPU N270   @ 1.60GHz" (no PAE, 1G RAM).

Please consider the test case I just added.  As you can see, addresses which are less than 16 bytes off the end will result in searching pages behind the buffer and make memchr return pointers outside of the area.

In a few cases, even different addresses are reported. (This was running inside gdb.)

src = 0xb7fd5000  src+4096 = 0xb7fd6000
0 (nil)
1 (nil)
2 (nil)
3 (nil)
15 (nil)
16 (nil)
17 (nil)
4080 (nil)
4081 0xb7fda6ed
4082 0xb7fda6ed
4083 0xb7fda6ed
4084 0xb7fd90ff
4094 0xb7fda6ed

Interestingly, valgrind takes no offense and returns (nil) for all inputs.

Merely reverting to glibc 2.24 fixes the issue, making it return (nil) for all inputs, as expected.

This is a serious regression that makes many programs crash and/or corrupt memory.



processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 28
model name	: Intel(R) Atom(TM) CPU N270   @ 1.60GHz
stepping	: 2
microcode	: 0x212
cpu MHz		: 800.000
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fdiv_bug	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 xtpr pdcm movbe lahf_lm dtherm
bugs		:
bogomips	: 3191.89
clflush size	: 64
cache_alignment	: 64
address sizes	: 32 bits physical, 32 bits virtual
power management:
Comment 5 Florian Weimer 2017-02-27 07:09:47 UTC
(In reply to Leah Neukirchen from comment #4)
> I could reproduce this on Linux 4.4.6 with glibc 2.25 and gcc 6.3.0 directly
> running on i686 "Intel(R) Atom(TM) CPU N270   @ 1.60GHz" (no PAE, 1G RAM).
> 
> Please consider the test case I just added.  As you can see, addresses which
> are less than 16 bytes off the end will result in searching pages behind the
> buffer and make memchr return pointers outside of the area.

I wasn't able to get access to an Atom N270 machine so far.  We need to single-step through the new code and see what happens to the register contents, to find the root cause of this bug.  So unless you can do that (comparing it with a working i386 machine), or someone can provide remote shell access to such a machine, we are stuck.
Comment 6 Leah Neukirchen 2017-02-27 13:07:09 UTC
(In reply to Florian Weimer from comment #5)
> (In reply to Leah Neukirchen from comment #4)
> > I could reproduce this on Linux 4.4.6 with glibc 2.25 and gcc 6.3.0 directly
> > running on i686 "Intel(R) Atom(TM) CPU N270   @ 1.60GHz" (no PAE, 1G RAM).
> > 
> > Please consider the test case I just added.  As you can see, addresses which
> > are less than 16 bytes off the end will result in searching pages behind the
> > buffer and make memchr return pointers outside of the area.
> 
> I wasn't able to get access to an Atom N270 machine so far.  We need to
> single-step through the new code and see what happens to the register
> contents, to find the root cause of this bug.  So unless you can do that
> (comparing it with a working i386 machine), or someone can provide remote
> shell access to such a machine, we are stuck.

I realized the machines where it works all use __memchr_sse2_bsf, but
I couldn't yet figure out how to force them to use __memchr_sse2 instead,
to make such a comparison trace.
Is there a trick to override the ifunc selection?

It would be nice if you could update the bug description to
"__memchr_sse2: regression in glibc-2.25 on i686"
or something, perhaps other people can find it easier then.
Comment 7 Florian Weimer 2017-02-27 13:31:27 UTC
(In reply to Leah Neukirchen from comment #6)
> I realized the machines where it works all use __memchr_sse2_bsf, but
> I couldn't yet figure out how to force them to use __memchr_sse2 instead,
> to make such a comparison trace.
> Is there a trick to override the ifunc selection?

Good observation.

Currently, recompiling glibc is the only reliable option I know of.  I used this patch:

diff --git a/sysdeps/i386/i686/multiarch/memchr.S b/sysdeps/i386/i686/multiarch/memchr.S
index bd0dace..4791d10 100644
--- a/sysdeps/i386/i686/multiarch/memchr.S
+++ b/sysdeps/i386/i686/multiarch/memchr.S
@@ -37,7 +37,7 @@ ENTRY(__memchr)
 2:	LOAD_FUNC_GOT_EAX (__memchr_ia32)
 	ret
 
-3:	LOAD_FUNC_GOT_EAX (__memchr_sse2_bsf)
+3:	LOAD_FUNC_GOT_EAX (__memchr_sse2)
 	ret
 END(__memchr)
 
With this patch, I get this output from your test case:

0 (nil)
1 (nil)
2 (nil)
3 (nil)
15 (nil)
16 (nil)
17 (nil)
4080 (nil)
4081 0xf7e4128a
4082 0xf7e4128a
4083 0xf7e4128a
4084 0xf7e4128a
4094 0xf7e4128a

So it doesn't look like an Atom-specific CPU bug, but a generic issue in the __memchr_sse2 function.
Comment 8 Florian Weimer 2017-02-27 13:47:18 UTC
The change is broken.  The subtraction of $16 happens in the wrong place:

diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
index f1a11b5..910679c 100644
--- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
+++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
@@ -118,8 +118,14 @@ L(crosscache):
 # ifndef USE_AS_RAWMEMCHR
        jnz     L(match_case2_prolog1)
        lea     -16(%edx), %edx
+        /* Calculate the last acceptable address and check for possible
+           addition overflow by using satured math:
+           edx = ecx + edx
+           edx |= -(edx < ecx)  */
        add     %ecx, %edx
-       jle     L(return_null)
+       sbb     %eax, %eax
+       or      %eax, %edx
+       jbe     L(return_null)
        lea     16(%edi), %edi
 # else
        jnz     L(match_case1_prolog1)
Comment 9 Adhemerval Zanella 2017-03-14 13:40:58 UTC
Indeed the subtraction is at wrong place and for this specific case we case use LEA (since it won't set CF flag use in following jbe).  I think following the logic already in place for x86_64 version is the correct approach:

diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
index 910679c..e41f324 100644
--- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
+++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
@@ -117,7 +117,6 @@ L(crosscache):
 
 # ifndef USE_AS_RAWMEMCHR
        jnz     L(match_case2_prolog1)
-       lea     -16(%edx), %edx
         /* Calculate the last acceptable address and check for possible
            addition overflow by using satured math:
            edx = ecx + edx
@@ -125,6 +124,7 @@ L(crosscache):
        add     %ecx, %edx
        sbb     %eax, %eax
        or      %eax, %edx
+       sub     $16, %edx
        jbe     L(return_null)
        lea     16(%edi), %edi
 # else

If the sub is negative (as for the testcase since size is 1), sub will set CF flag after the saturated addition and then return NULL as expected.  I will prepare the patch with the testcase as well.
Comment 10 Adhemerval Zanella 2017-03-29 12:56:54 UTC
Fixed by 3abeeec5f46f.
Comment 11 Sergey Kvachonok 2017-05-01 08:43:18 UTC
Can we get this into the 2.25 branch as well?

As it is, 2.25 is completely broken on my Atom CPU, and I don't want to wait until 2.26.
Comment 12 Adhemerval Zanella 2017-05-01 13:56:02 UTC
I applied on release/2.25/master as well.