Bug 12123 - SIGBUS on strstr_sse42 due to bad alignment
Summary: SIGBUS on strstr_sse42 due to bad alignment
Status: RESOLVED WORKSFORME
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.12
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-14 20:41 UTC by Lluís Batlle
Modified: 2014-06-30 07:49 UTC (History)
3 users (show)

See Also:
Host: i686-*-linux*
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Fix for the segfault loading pthread (303 bytes, patch)
2010-10-26 17:24 UTC, Lluís Batlle
Details | Diff
Second proposal after Jakub advice (236 bytes, patch)
2010-10-26 17:53 UTC, Lluís Batlle
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Lluís Batlle 2010-10-14 20:41:51 UTC
I've been having a trouble in the dynamic loader, at some code that ends up calling strstr(p, "SMP") (the is_system_smp() call, at __pthread_initialize_minimal_internal), before main(), in a system capable of sse4.2.

I am using gcc 4.5.1 with glibc 2.12.1 on GNU/Linux i686.

Here is my debugging session, starting the program 'xz' compiled with these above:

Program received signal SIGSEGV, Segmentation fault.
0xf7f5141e in __strstr_sse42 ()
   from /nix/store/mdjakl9ywywrrz9149cs6n7yfzvb12v7-glibc-2.12.1/lib/libc.so.6
(gdb) bt
#0  0xf7f5141e in __strstr_sse42 ()
   from /nix/store/mdjakl9ywywrrz9149cs6n7yfzvb12v7-glibc-2.12.1/lib/libc.so.6
#1  0xf7fa3af0 in __pthread_initialize_minimal_internal ()
   from /nix/store/mdjakl9ywywrrz9149cs6n7yfzvb12v7-glibc-2.12.1/lib/libpthread.so.0
#2  0xf7fa3148 in _init ()
   from /nix/store/mdjakl9ywywrrz9149cs6n7yfzvb12v7-glibc-2.12.1/lib/libpthread.so.0
#3  0xf7fec4d4 in call_init ()
   from /nix/store/mdjakl9ywywrrz9149cs6n7yfzvb12v7-glibc-2.12.1/lib/ld-linux.so.2
#4  0xf7fec666 in _dl_init_internal ()
   from /nix/store/mdjakl9ywywrrz9149cs6n7yfzvb12v7-glibc-2.12.1/lib/ld-linux.so.2
#5  0xf7fdd85f in _dl_start_user ()
   from /nix/store/mdjakl9ywywrrz9149cs6n7yfzvb12v7-glibc-2.12.1/lib/ld-linux.so.2
(gdb) disassemble 
........
0xf7f51416 <__strstr_sse42+70>: call   0xf7f51270 <__m128i_strloadu>
0xf7f5141b <__strstr_sse42+75>: mov    0xc(%ebp),%ecx
0xf7f5141e <__strstr_sse42+78>: movdqa %xmm0,-0x34(%ebp)
0xf7f51423 <__strstr_sse42+83>: cmpb   $0x0,0x1(%ecx)
0xf7f51427 <__strstr_sse42+87>: je     0xf7f51600 <__strstr_sse42+560>
0xf7f5142d <__strstr_sse42+93>: mov    %ecx,%eax
.......


Notice the segfault (SIGBUS in fact) at:
0xf7f5141e <__strstr_sse42+78>: movdqa %xmm0,-0x34(%ebp)

(gdb) print $ebp - 0x34
$4 = (void *) 0xffffce4c

See that the address is not aligned.

The strstr_sse42 code is in ./sysdeps/x86_64/multiarch/strstr.c, and I bet it is this line (the first __m128i_strloadu call):
  __m128i frag1 = strloadu (p1);

This is a simple assignment. From http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838 I understand that gcc supposes that the stack is aligned for every function, and the $ebp here shows that it is not aligned.

I think the dynamic loader first functions should guarantee an aligned stack, and it is not the case I think.

If you need more information, the bug is easy to reproduce here.
Comment 1 Lluís Batlle 2010-10-14 20:43:16 UTC
Sorry, I forgot to mention that in x86_64 this problem does not happen.
Comment 2 H.J. Lu 2010-10-26 06:46:10 UTC
There are

static inline __m128i
__m128i_strloadu (const unsigned char * p)
{
  int offset = ((size_t) p & (16 - 1));

  if (offset && (int) ((size_t) p & 0xfff) > 0xff0)
    {   
      __m128i a = _mm_load_si128 ((__m128i *) (p - offset));
      __m128i zero = _mm_setzero_si128 (); 
      int bmsk = _mm_movemask_epi8 (_mm_cmpeq_epi8 (a, zero));
      if ((bmsk >> offset) != 0)
        return __m128i_shift_right (a, offset);
    }   
  return _mm_loadu_si128 ((__m128i *) p); 
}

It can load from unaligned address. I have no problems
with gcc 4.5.1 and glibc mainline on 32bit Intel Core i7.
What failures did you see with "make check" in glibc?
Comment 3 Lluís Batlle 2010-10-26 07:22:17 UTC
The problem is not inside __m128i_strloadu, but in the assignment from the result of  __m128i_strloadu to the automatic variable variable "frag1", in the body of STRSTR_SSE42.
Comment 4 H.J. Lu 2010-10-26 07:38:53 UTC
Please show me the command line you used to compile
strstr.os and upload your strstr.os.
Comment 5 Lluís Batlle 2010-10-26 07:50:57 UTC
Sorry, but I don't think the problem is in the strstr file.

As the traces show, the code in strstr is properly compiled and looks good. Simply gcc assumed at the start of the function the code will be stack-aligned.

Before main() in C programs, the libc assures that the stack is aligned. And gcc assures that no function misaligns the stack.

But the problem I mention happens in a strstr() call *BEFORE* main(), in the dynamic loader, where the stack may not be aligned. And then, any SIMD code compiled by gcc will fail, because the assumption of aligned stack looks like not being true at that moment.

I think this problem is more of the dynamic loader, which should assure an aligned stack before calling anything like strstr().
Comment 6 Lluís Batlle 2010-10-26 07:52:03 UTC
Did you try to build and run xz ? I don't know what does 'xz' do, that triggers the strstr() call in the dynamic loader. It may be a good way to reproduce the problem.

http://tukaani.org/xz/
Comment 7 Andreas Schwab 2010-10-26 15:12:33 UTC
How did you configure glibc?  Normally glibc maintains at most 8 byte stack alignment internally.  But even if the stack is 16 byte aligned -0x34(%ebp) cannot be correctly aligned since ebp points 8 bytes into the stack frame.
Comment 8 Lluís Batlle 2010-10-26 16:17:25 UTC
I did not add any special flag configuring glibc. It's a usual personal computer, where I expect glibc to find all the proper options automatically, and not cross compiling.

Maintaining the proper alignment between calls depends on gcc, not glibc. And I imagine glibc calls also maintain the alignment.

I am pointing at the alignment at the dynamic loader; I imagine the dynamic loader runs that strstr() call without having passed the stage of aligning the stack.

Maybe in the initial report I forgot to say that looking at $ebp for every frame of the backtrace reported a misaligned %ebp, since _dl_start_user.
Comment 9 Lluís Batlle 2010-10-26 16:31:11 UTC
(In reply to comment #7)
> How did you configure glibc?  Normally glibc maintains at most 8 byte stack
> alignment internally.  But even if the stack is 16 byte aligned -0x34(%ebp)
> cannot be correctly aligned since ebp points 8 bytes into the stack frame.
If what you say is correct about "at most 8 byte stack alignment", that is not enough for SSE instructions, which require 16 byte alignment.

I'll reproduce again the bug and report more information about the %ebp at every point. I read in the source that in i386 _dl_init_internal has to be called with a 16-byte alignment; I'll check where that gets lost.
Comment 10 Lluís Batlle 2010-10-26 16:49:17 UTC
I've found that the _init section of glibc 2.12.1 libpthread.so.0 breaks the stack alignment. See the objdump, and see how the 16-bit alignment comming at _init is broken calling the functions there listed (push + call = 8 bytes instead of 16).

Disassembly of section .init:

00004140 <_init>:
    4140:       55                      push   %ebp
    4141:       89 e5                   mov    %esp,%ebp
    4143:       e8 f8 05 00 00          call   4740 <__pthread_initialize_minimal>
    4148:       e8 43 05 00 00          call   4690 <frame_dummy>
    414d:       e8 1e d6 00 00          call   11770 <__do_global_ctors_aux>
    4152:       5d                      pop    %ebp
    4153:       c3                      ret    


But I still don't know what writes that _init there.
Comment 11 H.J. Lu 2010-10-26 17:22:06 UTC
(In reply to comment #10)
> I've found that the _init section of glibc 2.12.1 libpthread.so.0 breaks the
> stack alignment. See the objdump, and see how the 16-bit alignment comming at
> _init is broken calling the functions there listed (push + call = 8 bytes
> instead of 16).
> 
> Disassembly of section .init:
> 
> 00004140 <_init>:
>     4140:       55                      push   %ebp
>     4141:       89 e5                   mov    %esp,%ebp
>     4143:       e8 f8 05 00 00          call   4740
> <__pthread_initialize_minimal>
>     4148:       e8 43 05 00 00          call   4690 <frame_dummy>
>     414d:       e8 1e d6 00 00          call   11770 <__do_global_ctors_aux>
>     4152:       5d                      pop    %ebp
>     4153:       c3                      ret    
> 
> 
> But I still don't know what writes that _init there.

It comes from nptl/sysdeps/pthread/pt-initfini.c, which is
compiled with -mpreferred-stack-boundary=2.
Comment 12 Lluís Batlle 2010-10-26 17:24:10 UTC
Created attachment 5086 [details]
Fix for the segfault loading pthread

I found the problem. I attach a patch.

The comment of the section I patch says:
# Most of the glibc routines don't ever call user defined callbacks
# nor use any FPU or SSE* and as such don't need bigger %esp alignment
# than 4 bytes.
# Lots of routines in math will use FPU, so make math subdir an exception
# here.

So, the problem is clear. The pthread library should be built with a 4word stack alignment instead of 2.

Introducing more SSE into glibc should result in a constant review of this Makefile I patch, regarding the stack alignment, although who would have known that libpthread calls strstr()?
Comment 13 Jakub Jelinek 2010-10-26 17:37:31 UTC
This patch is wrong.  You don't need to force it for all of NPTL, it needs to be done only for the source files that actually have callbacks.  For most of those the Makefiles do it already.  See e.g.
ifeq ($(subdir),nptl)                                   
CFLAGS-pthread_create.c += -mpreferred-stack-boundary=4                  
CFLAGS-tst-align.c += -mpreferred-stack-boundary=4                       
CFLAGS-tst-align2.c += -mpreferred-stack-boundary=4                      
endif

in libc/nptl/sysdeps/i386/Makefile.  Just handle pt-initfini.c the same way.
Comment 14 Lluís Batlle 2010-10-26 17:53:00 UTC
Created attachment 5087 [details]
Second proposal after Jakub advice

Thank you Jakub.

Here I propose a patch in the way you described. Two files need the 4 word alignment. One with the _init, and the other with the pthread_initialize_minimal.
Comment 15 Andreas Schwab 2010-10-27 08:25:36 UTC
strstr.os was compiled with -mpreferred-stack-boundary=2 which implies -mincoming-stack-boundary=2, so it is a compiler bug if it assumes bigger alignment.
Comment 16 Lluís Batlle 2010-10-27 12:37:47 UTC
(In reply to comment #15)
> strstr.os was compiled with -mpreferred-stack-boundary=2 which implies
> -mincoming-stack-boundary=2, so it is a compiler bug if it assumes bigger
> alignment.

So you think that, if the user asked "-mpreferred-stack-boundary=2", but the function body has SSE vectors (that have to be 4w aligned), gcc should decide to align them properly to 4w instead of blindly setting a 2w stack alignment, right?
Comment 17 throctukes 2010-10-28 15:42:10 UTC
This looks very similar to a problem I've been seeing which manifests in the Mono soft debugger - although interestingly I've seen this on 64bit architecture. Discussion of the problem has been going on here: http://ubuntuforums.org/showthread.php?t=1602295 - Here's a summary:

On machines using the sse4.2 instruction set there seems to be an issue with the use of the optimised strstr function in glibc which causes a segfault whenever the mono soft debugger is used. A VMWare image run on a machine without an sse4.2 capable processor will behave normally (i.e., no segfault), but if transferred onto a machine with an sse4.2 capable processor will segfault in the same way as the host would (i.e., will segfault during debugging), which suggests that when this issue occurs it's not due to a misconfigured system.

If glibc is built without the sse4.2 optimised version of strstr enabled, this issue is not apparent.

Inside strstr, the segfault traces to line 286 in x86_64/multiarch/strstr.c inside glibc:

280  /* p1 > 1 byte long.  Load up to 16 bytes of fragment.  */
281  __m128i frag1 = strloadu (p1);
282
283  __m128i frag2;
284  if (p2[1] != '\0')
285    /* p2 is > 1 byte long.  */
286    frag2 = strloadu (p2); 
287  else
288    frag2 = _mm_insert_epi8 (_mm_setzero_si128 (), LOADBYTE (p2[0]), 0);

Checking the disassembly at this point:

Program received signal SIGSEGV, Segmentation fault.
0x00007f85b165ee6b in __strstr_sse42 (s1=0x6d5280 "mono_create_corlib_exception_1", s2=0x6d3f05 "ves_array_new_va_") at ../sysdeps/x86_64/multiarch/strstr.c:286

...

(gdb) disassemble

...

   0x00007f85b165ee60 <+880>:   mov    %rbx,%rax
   0x00007f85b165ee63 <+883>:   jmpq   0x7f85b165eb22 <__strstr_sse42+50>
   0x00007f85b165ee68 <+888>:   mov    %r13,%rdi
=> 0x00007f85b165ee6b <+891>:   movdqa %xmm0,(%rsp)
   0x00007f85b165ee70 <+896>:   callq  0x7f85b165e9b0 <__m128i_strloadu>
   0x00007f85b165ee75 <+901>:   movdqa %xmm0,0x10(%rsp)
   0x00007f85b165ee7b <+907>:   movdqa (%rsp),%xmm1
   0x00007f85b165ee80 <+912>:   jmpq   0x7f85b165eb98 <__strstr_sse42+168>

...

(gdb) p $rsp
$1 = (void *) 0x7fff829769e8

Bug report on the Mono bugzilla: 
https://bugzilla.novell.com/show_bug.cgi?id=647464

A similar problem to this has been reported on the NVidia Developer Zone forum affecting the NVPerfKit tool (again a debugging tool as far as I can tell) http://developer.nvidia.com/forums/index.php?showtopic=4926
Comment 18 Lluís Batlle 2010-10-28 19:29:14 UTC
(In reply to comment #15)
> strstr.os was compiled with -mpreferred-stack-boundary=2 which implies
> -mincoming-stack-boundary=2, so it is a compiler bug if it assumes bigger
> alignment.

I see '-mincoming-stack-boundary' appears only in gcc 4.4. Is gcc 4.4 a requirement for building glibc?

(In reply to comment #17)
We use gcc 4.3 to build glibc in i686-linux, and we use gcc 4.5.1 to build x86_64-linux, so that may make a difference, and maybe that's why we only saw the problem in i686-linux. What compiler are you using?
With gcc 4.5.1, strstr_sse42 starts this way:
 130:   48 89 5c 24 d0          mov    %rbx,-0x30(%rsp)
 135:   4c 89 6c 24 e8          mov    %r13,-0x18(%rsp)
 13a:   48 89 fb                mov    %rdi,%rbx
 13d:   48 89 6c 24 d8          mov    %rbp,-0x28(%rsp)
 142:   4c 89 64 24 e0          mov    %r12,-0x20(%rsp)
 147:   49 89 f5                mov    %rsi,%r13
 14a:   4c 89 74 24 f0          mov    %r14,-0x10(%rsp)
 14f:   4c 89 7c 24 f8          mov    %r15,-0x8(%rsp)
 154:   48 83 ec 58             sub    $0x58,%rsp

So it expects stacks 4w aligned, the default in gcc. If you have problems in x86_64 about bad alignment on strstr_sse42, you should revise that your code maintains the stack aligned 4w.
Comment 19 H.J. Lu 2010-10-28 19:32:05 UTC
(In reply to comment #18)
> (In reply to comment #15)
> > strstr.os was compiled with -mpreferred-stack-boundary=2 which implies
> > -mincoming-stack-boundary=2, so it is a compiler bug if it assumes bigger
> > alignment.
> 
> I see '-mincoming-stack-boundary' appears only in gcc 4.4. Is gcc 4.4 a
> requirement for building glibc?
> 
> (In reply to comment #17)
> We use gcc 4.3 to build glibc in i686-linux, and we use gcc 4.5.1 to build

You need gcc 4.4 or above to properly align the stack in 32bit.
Comment 20 Lluís Batlle 2010-10-28 20:10:04 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #15)
> > > strstr.os was compiled with -mpreferred-stack-boundary=2 which implies
> > > -mincoming-stack-boundary=2, so it is a compiler bug if it assumes bigger
> > > alignment.
> > 
> > I see '-mincoming-stack-boundary' appears only in gcc 4.4. Is gcc 4.4 a
> > requirement for building glibc?
> > 
> > (In reply to comment #17)
> > We use gcc 4.3 to build glibc in i686-linux, and we use gcc 4.5.1 to build
> 
> You need gcc 4.4 or above to properly align the stack in 32bit.

Ok, thank you. I see.
I tried building glibc with gcc 4.5.1, and strstr_sse42 starts properly:
  1121d0:       55                      push   %ebp
  1121d1:       89 e5                   mov    %esp,%ebp
  1121d3:       83 e4 f0                and    $0xfffffff0,%esp

Couldn't the configure script check the gcc version? Otherwise gcc 4.3 silently builds a broken glibc.
Comment 21 throctukes 2010-10-29 15:43:28 UTC
(In reply to comment #18)
> We use gcc 4.3 to build glibc in i686-linux, and we use gcc 4.5.1 to build
> x86_64-linux, so that may make a difference, and maybe that's why we only saw
> the problem in i686-linux. What compiler are you using?
> With gcc 4.5.1, strstr_sse42 starts this way:
>  130:   48 89 5c 24 d0          mov    %rbx,-0x30(%rsp)
>  135:   4c 89 6c 24 e8          mov    %r13,-0x18(%rsp)
>  13a:   48 89 fb                mov    %rdi,%rbx
>  13d:   48 89 6c 24 d8          mov    %rbp,-0x28(%rsp)
>  142:   4c 89 64 24 e0          mov    %r12,-0x20(%rsp)
>  147:   49 89 f5                mov    %rsi,%r13
>  14a:   4c 89 74 24 f0          mov    %r14,-0x10(%rsp)
>  14f:   4c 89 7c 24 f8          mov    %r15,-0x8(%rsp)
>  154:   48 83 ec 58             sub    $0x58,%rsp
> 
> So it expects stacks 4w aligned, the default in gcc. If you have problems in
> x86_64 about bad alignment on strstr_sse42, you should revise that your code
> maintains the stack aligned 4w.

I've been using the libc6 from the Ubuntu repositories, but even recompiling glibc from sources with gcc 4.4 and 4.5 makes no difference.
Comment 22 Ulrich Drepper 2010-11-01 18:41:34 UTC
This is a compiler issue.  Use a fixed compiler.  If someone can identify a change in the compiler which fixes the problem an appropriate check can be added.  But perhaps not.  My strstr.os seems not to need any compiler changes.   It must be local changes of some sort and then there should also be a local gcc check.  In any case, nothing to be done to the code, it's fine.
Comment 23 Lluís Batlle 2010-11-01 18:51:14 UTC
(In reply to comment #22)
> This is a compiler issue.  Use a fixed compiler.  If someone can identify a
> change in the compiler which fixes the problem an appropriate check can be
> added.  But perhaps not.  My strstr.os seems not to need any compiler changes. 
>  It must be local changes of some sort and then there should also be a local
> gcc check.  In any case, nothing to be done to the code, it's fine.

I think that all gccs previous to gcc 4.4 lack the configurable feature of "expecting a stack alignment of X", that in gcc 4.4 due to the flags set in glibc end up in "expecting a stack alignment of 2 words" - the behaviour needed.
As other people in the bug report mentioned, that code (plus gcc flags) is meant to be built only with gcc 4.4 or above.
You may name gcc 4.4 and above "a fixed compiler", but I would not say that relates to any "local changes".