Bug 206

Summary: malloc does not align memory correctly for sse capable systems
Product: glibc Reporter: Florian Schanda <ma1flfs>
Component: mallocAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED DUPLICATE    
Severity: critical CC: bangerth, fweimer, glibc-bugs
Priority: P2 Flags: fweimer: security-
Version: 2.3.3   
Target Milestone: ---   
Host: i686-pc-linux-gnu Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu Last reconfirmed:

Description Florian Schanda 2004-06-05 14:22:39 UTC
From what I gather, malloc is supposed to return memory aligned respecting the 
most strict alignment requirements there are for a system. Currently this is 8 
bytes for i686 systems. This should be 16 bytes because of the __m128 type sse 
intrinsics use. 
 
See also this gcc "bug" for more details: 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795 
 
Basically, this program segfaults: 
------------------------------------------ 
#include <xmmintrin.h>  
 
int main() {  
  __m128 * foo = new __m128;  
  *foo = _mm_setzero_ps();  
} 
------------------------------------------
Comment 1 GOTO Masanori 2004-06-05 19:07:31 UTC
Wolfram,

The alignment size is defined as MALLOC_ALIGNMENT defined as
(2*sizeof(INTERNAL_SIZE_T)). INTERNAL_SIZE_T is size_t (4 byte),
so MALLOC_ALIGNMENT is 8 bytes.  Bug#206 says it's not suitable
for SSE instruction that needs 16 byte alignment as follows:
   http://sources.redhat.com/bugzilla/show_bug.cgi?id=206
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795 

I don't know that changing memory alignment size is acceptable
on i386.  IMHO, SSE is special instruction, so I think in this
case using posix_memalign() is safe.  Please check this bug?
Comment 2 Florian Schanda 2004-06-05 19:44:22 UTC
Using posix_memalign as a workaround does work. However, IMHO the code posted  
should "just work", as it is perfectly valid C++, just as the __m128 type  
is a perfectly valid type on machines supporting sse.  
Comment 3 wg@malloc.de 2004-06-05 22:24:37 UTC
Subject: Re:  malloc does not align memory correctly for sse capable systems

Hello,

> The alignment size is defined as MALLOC_ALIGNMENT defined as
> (2*sizeof(INTERNAL_SIZE_T)). INTERNAL_SIZE_T is size_t (4 byte),
> so MALLOC_ALIGNMENT is 8 bytes.

Yes.  I believe setting MALLOC_ALIGNMENT to 16 would work in the
source, but _should not_ be done in the general case, because of the
significant performance drop (many more objects /about half would need
to be rounded up to a multiple of 16 in size).

>  Bug#206 says it's not suitable
> for SSE instruction that needs 16 byte alignment as follows:
>    http://sources.redhat.com/bugzilla/show_bug.cgi?id=206
>    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795 
> 
> I don't know that changing memory alignment size is acceptable
> on i386.  IMHO, SSE is special instruction, so I think in this
> case using posix_memalign() is safe.  Please check this bug?

I am well aware that the C standard says that alignment needs to be
"suitable for any type of object".  However, SSE instructions are IMHO
clearly out of the scope of the C standard -- at least it is not a
clear case of non-conformity to the standard.

Best would be to use posix_memalign() in the applications only for the
allocations where the alignment is really required, because that would
give optimal performance and least memory waste.

Second best (if you don't want/can't change the apps _at all_) would
be to have an additional shared library (libmalloc16?) where malloc is
compiled with MALLOC_ALIGNMENT=16, and link that library only into the
SSE-using applications.  A bit tricky because of interdependency with
libpthread, but most probably doable.

Worst and IMHO unacceptable would be to make MALLOC_ALIGNMENT dynamic;
malloc would become much slower.

What do you think?  How do other systems handle this?

Regards,
Wolfram.

Comment 4 Wolfgang Bangerth 2004-06-06 18:27:35 UTC
For a history of this bug: this used to be PR 15795 in gcc's bugzilla, see  
  http://gcc.gnu.org/ml/gcc-bugs/2004-06/msg00552.html  
  
As for the implications: the C standard says that the pointer returned by  
malloc needs to be sufficiently aligned for _all_ data types. This alignment  
is specified in the ABI of a system, and for systems that allow the creation  
of SSE data types, this means that malloc needs to return 16-byte aligned  
pointers. I know that this is unfortunate, but there really is no other  
way of fixing malloc; for example, consider a program that uses  
  std::vector<__m128> x(13);  
Here, std::vector has to call 'operator new' which itself has to call  
malloc(). If malloc doesn't return sufficiently aligned pointers, the  
resulting std::vector is unusable. Note that here the memory allocation  
has to happen inside gcc's libstdc++, and is this out of user's control  
-- so one can't use posix_memalign here.  
  
W.  
Comment 5 wg@malloc.de 2004-06-06 19:17:35 UTC
Subject: Re:  malloc does not align memory correctly for sse capable systems

Hello,

> As for the implications: the C standard says that the pointer returned by  
> malloc needs to be sufficiently aligned for _all_ data types. This alignment  
> is specified in the ABI of a system, and for systems that allow the creation  
> of SSE data types, this means that malloc needs to return 16-byte aligned  
> pointers.

Ok, surely _the_ C standard doesn't specify "SSE data types".  Some
extension of the C standard for an SSE system should of course specify
this exactly as you say, however, so I am not against supplying an
additional libmalloc16 to be used with SSE applications.  I'll look
into creating one within glibc.

> I know that this is unfortunate, but there really is no other  
> way of fixing malloc; for example, consider a program that uses  
>   std::vector<__m128> x(13);  
> Here, std::vector has to call 'operator new' which itself has to call  
> malloc(). If malloc doesn't return sufficiently aligned pointers, the  
> resulting std::vector is unusable. Note that here the memory allocation  
> has to happen inside gcc's libstdc++, and is this out of user's control  
> -- so one can't use posix_memalign here.  

Couldn't you use a non-default allocator?
(I know it is not as convenient.)

Regards,
Wolfram.

Comment 6 GOTO Masanori 2004-06-07 14:04:52 UTC
Subject: Re:  malloc does not align memory correctly for sse capable systems

Hi,

I agree that __m128 is NOT standard C type, so malloc() does NOT need
to align to 16byte for SSE instruction as Wolfram pointed out.

SSE is special vector-typed array, so __m128 is sometimes hard to handle
with like other types.  C++ new() does not have memory with alignment
designater, so "std::vector<__m128> x(13)" does not specify any alignment
too.

One idea to fix it is to use special handling aligned() for C++ new.  
__m128 typedef involves additional __attribute__ ((aligned (16))).  
AFAIK there is no way to tell the alignment to standard malloc() interface.
So, if g++ can allow to distinguish with aligned attribute and
switch new() code, then we can prepare another overloading new()
which has "aligned parameter" argument (I hope new operator for
vector also does the same thing).
Such overloaded new() with alignment parameter can use
posix_memalign() instead of malloc().  You may know that, there is
note description at build_new() at gcc/cp/init.c:

   Note that build_new does nothing to assure that any special
   alignment requirements of the type are met.  Rather, it leaves
   it up to malloc to do the right thing.  Otherwise, folding to
   the right alignment cal cause problems if the user tries to later
   free the memory returned by `new'.

But if one class or struct has __attribute__ ((aligned (16))), why do
we ignore it?  So I think handling aligned attribute is right way.
Is this appliable for g++?

BTW, I dunno that libmalloc16 is safe way or not.
Comment 7 bangerth@ices.utexas.edu 2004-06-07 14:23:47 UTC
Subject: Re:  malloc does not align memory correctly for sse
 capable systems


> Ok, surely _the_ C standard doesn't specify "SSE data types".

While that may be true, it is a quality of implementation issue.

Also, note footnote 34 in the C99 standard. It says:
  An implementation may define new keywords that provide ways to designate
  a basic (*or any other*) type; [...]
(emphasis mine). This would seem to imply that we are quite allowed to use 
an SSE type as another basic type, and if we do so the same provisions of 
the standard should apply to it.


> > I know that this is unfortunate, but there really is no other  
> > way of fixing malloc; for example, consider a program that uses  
> >   std::vector<__m128> x(13);  
> > Here, std::vector has to call 'operator new' which itself has to call  
> > malloc(). If malloc doesn't return sufficiently aligned pointers, the  
> > resulting std::vector is unusable. Note that here the memory allocation  
> > has to happen inside gcc's libstdc++, and is this out of user's control  
> > -- so one can't use posix_memalign here.  
> 
> Couldn't you use a non-default allocator?
> (I know it is not as convenient.)

That may be very hard to do. Consider this case:
  template <typename T> class MyClass {
    std::vector<T> member;
  };
One would only need to use the special allocator if T==some SSe type, but 
not otherwise. That can be done via some really awkward template hacking, 
but isn't very nice. In addition, MyClass may be in a third-party (not 
compiler, not application) library, over which I have no control. So there 
definitely are cases where we can't do this.

W.

-------------------------------------------------------------------------
Wolfgang Bangerth              email:            bangerth@ices.utexas.edu
                               www: http://www.ices.utexas.edu/~bangerth/


Comment 8 bangerth@ices.utexas.edu 2004-06-07 14:30:23 UTC
Subject: Re:  malloc does not align memory correctly for sse
 capable systems


> One idea to fix it is to use special handling aligned() for C++ new.  

Yes, but the C++ maintainers have already said that they don't want to do 
that.

As an additional problem, one can overload operator new in an application 
program, so you force writers of such overloads to reinvent the same 
kludge again each time they write such functions. Think of cases like
  template <typename T> class MyClass {
    void * operator new (const size_t sz) {
      // what to do here? we need to figure out the alignment requirements
      // of T, but there is no standard conforming way to do this
    }
  };

Note also that in C++ you can only overload based on types, not on 
alignment requirements, so figuring out the alignment really has to happen 
inside above operator function.

W.

-------------------------------------------------------------------------
Wolfgang Bangerth              email:            bangerth@ices.utexas.edu
                               www: http://www.ices.utexas.edu/~bangerth/


Comment 9 Jakub Jelinek 2004-06-08 15:02:48 UTC
MALLOC_ALIGNMENT in glibc is really not going to change, it is a quality of
implementation, sure, by making it bigger the quality implementation would drop
a lot for most of the programs out there.
SSE types are certainly out of the scope of the C standard.  GCC allows to create
objects with arbitrary alignment, not just __m128, but you can use say:
__attribute__((aligned (256))).  With the same argumentation, you could request
that all malloc memory is 256 bytes aligned (or 4K or whatever you choose).
If you want to make C++ new working on these types, the compiler will simply
need to do its part and call some (non-standard) new operator with additional
alignment argument, which would in turn call posix_memalign, perhaps guarded with
some compiler option which will otherwise result in a compile time error if new
is used on types with too big alignment requirement.

Comment 10 bangerth@ices.utexas.edu 2004-06-08 15:41:52 UTC
Subject: Re:  malloc does not align memory correctly for sse capable systems


> MALLOC_ALIGNMENT in glibc is really not going to change, it is a
> quality of implementation, sure, by making it bigger the quality
> implementation would drop a lot for most of the programs out there.

Is it possible to quantify this somehow?


> SSE types are certainly out of the scope of the C standard.  GCC allows to
> create objects with arbitrary alignment, not just __m128, but you can use
> say: __attribute__((aligned (256))).  With the same argumentation, you
> could request that all malloc memory is 256 bytes aligned (or 4K or
> whatever you choose).

That's a reasonable argument indeed. In fact, we would presently simply get 
code that ignores this alignment requirement, we only noticed for SSE types 
since the processor traps when this happens.


> If you want to make C++ new working on these types,

OK, I will go back to the gcc guys with this. Thanks
  Wolfgang

-------------------------------------------------------------------------
Wolfgang Bangerth              email:            bangerth@ices.utexas.edu
                               www: http://www.ices.utexas.edu/~bangerth/

Comment 11 Florian Weimer 2019-04-10 11:35:13 UTC
This was actually fixed as bug 21120.

*** This bug has been marked as a duplicate of bug 21120 ***