From what I gather, malloc is supposed to return memory aligned respecting the most strict alignment requirements there are for a system. Currently this is 8 bytes for i686 systems. This should be 16 bytes because of the __m128 type sse intrinsics use. See also this gcc "bug" for more details: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795 Basically, this program segfaults: ------------------------------------------ #include <xmmintrin.h> int main() { __m128 * foo = new __m128; *foo = _mm_setzero_ps(); } ------------------------------------------
Wolfram, The alignment size is defined as MALLOC_ALIGNMENT defined as (2*sizeof(INTERNAL_SIZE_T)). INTERNAL_SIZE_T is size_t (4 byte), so MALLOC_ALIGNMENT is 8 bytes. Bug#206 says it's not suitable for SSE instruction that needs 16 byte alignment as follows: http://sources.redhat.com/bugzilla/show_bug.cgi?id=206 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795 I don't know that changing memory alignment size is acceptable on i386. IMHO, SSE is special instruction, so I think in this case using posix_memalign() is safe. Please check this bug?
Using posix_memalign as a workaround does work. However, IMHO the code posted should "just work", as it is perfectly valid C++, just as the __m128 type is a perfectly valid type on machines supporting sse.
Subject: Re: malloc does not align memory correctly for sse capable systems Hello, > The alignment size is defined as MALLOC_ALIGNMENT defined as > (2*sizeof(INTERNAL_SIZE_T)). INTERNAL_SIZE_T is size_t (4 byte), > so MALLOC_ALIGNMENT is 8 bytes. Yes. I believe setting MALLOC_ALIGNMENT to 16 would work in the source, but _should not_ be done in the general case, because of the significant performance drop (many more objects /about half would need to be rounded up to a multiple of 16 in size). > Bug#206 says it's not suitable > for SSE instruction that needs 16 byte alignment as follows: > http://sources.redhat.com/bugzilla/show_bug.cgi?id=206 > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795 > > I don't know that changing memory alignment size is acceptable > on i386. IMHO, SSE is special instruction, so I think in this > case using posix_memalign() is safe. Please check this bug? I am well aware that the C standard says that alignment needs to be "suitable for any type of object". However, SSE instructions are IMHO clearly out of the scope of the C standard -- at least it is not a clear case of non-conformity to the standard. Best would be to use posix_memalign() in the applications only for the allocations where the alignment is really required, because that would give optimal performance and least memory waste. Second best (if you don't want/can't change the apps _at all_) would be to have an additional shared library (libmalloc16?) where malloc is compiled with MALLOC_ALIGNMENT=16, and link that library only into the SSE-using applications. A bit tricky because of interdependency with libpthread, but most probably doable. Worst and IMHO unacceptable would be to make MALLOC_ALIGNMENT dynamic; malloc would become much slower. What do you think? How do other systems handle this? Regards, Wolfram.
For a history of this bug: this used to be PR 15795 in gcc's bugzilla, see http://gcc.gnu.org/ml/gcc-bugs/2004-06/msg00552.html As for the implications: the C standard says that the pointer returned by malloc needs to be sufficiently aligned for _all_ data types. This alignment is specified in the ABI of a system, and for systems that allow the creation of SSE data types, this means that malloc needs to return 16-byte aligned pointers. I know that this is unfortunate, but there really is no other way of fixing malloc; for example, consider a program that uses std::vector<__m128> x(13); Here, std::vector has to call 'operator new' which itself has to call malloc(). If malloc doesn't return sufficiently aligned pointers, the resulting std::vector is unusable. Note that here the memory allocation has to happen inside gcc's libstdc++, and is this out of user's control -- so one can't use posix_memalign here. W.
Subject: Re: malloc does not align memory correctly for sse capable systems Hello, > As for the implications: the C standard says that the pointer returned by > malloc needs to be sufficiently aligned for _all_ data types. This alignment > is specified in the ABI of a system, and for systems that allow the creation > of SSE data types, this means that malloc needs to return 16-byte aligned > pointers. Ok, surely _the_ C standard doesn't specify "SSE data types". Some extension of the C standard for an SSE system should of course specify this exactly as you say, however, so I am not against supplying an additional libmalloc16 to be used with SSE applications. I'll look into creating one within glibc. > I know that this is unfortunate, but there really is no other > way of fixing malloc; for example, consider a program that uses > std::vector<__m128> x(13); > Here, std::vector has to call 'operator new' which itself has to call > malloc(). If malloc doesn't return sufficiently aligned pointers, the > resulting std::vector is unusable. Note that here the memory allocation > has to happen inside gcc's libstdc++, and is this out of user's control > -- so one can't use posix_memalign here. Couldn't you use a non-default allocator? (I know it is not as convenient.) Regards, Wolfram.
Subject: Re: malloc does not align memory correctly for sse capable systems Hi, I agree that __m128 is NOT standard C type, so malloc() does NOT need to align to 16byte for SSE instruction as Wolfram pointed out. SSE is special vector-typed array, so __m128 is sometimes hard to handle with like other types. C++ new() does not have memory with alignment designater, so "std::vector<__m128> x(13)" does not specify any alignment too. One idea to fix it is to use special handling aligned() for C++ new. __m128 typedef involves additional __attribute__ ((aligned (16))). AFAIK there is no way to tell the alignment to standard malloc() interface. So, if g++ can allow to distinguish with aligned attribute and switch new() code, then we can prepare another overloading new() which has "aligned parameter" argument (I hope new operator for vector also does the same thing). Such overloaded new() with alignment parameter can use posix_memalign() instead of malloc(). You may know that, there is note description at build_new() at gcc/cp/init.c: Note that build_new does nothing to assure that any special alignment requirements of the type are met. Rather, it leaves it up to malloc to do the right thing. Otherwise, folding to the right alignment cal cause problems if the user tries to later free the memory returned by `new'. But if one class or struct has __attribute__ ((aligned (16))), why do we ignore it? So I think handling aligned attribute is right way. Is this appliable for g++? BTW, I dunno that libmalloc16 is safe way or not.
Subject: Re: malloc does not align memory correctly for sse capable systems > Ok, surely _the_ C standard doesn't specify "SSE data types". While that may be true, it is a quality of implementation issue. Also, note footnote 34 in the C99 standard. It says: An implementation may define new keywords that provide ways to designate a basic (*or any other*) type; [...] (emphasis mine). This would seem to imply that we are quite allowed to use an SSE type as another basic type, and if we do so the same provisions of the standard should apply to it. > > I know that this is unfortunate, but there really is no other > > way of fixing malloc; for example, consider a program that uses > > std::vector<__m128> x(13); > > Here, std::vector has to call 'operator new' which itself has to call > > malloc(). If malloc doesn't return sufficiently aligned pointers, the > > resulting std::vector is unusable. Note that here the memory allocation > > has to happen inside gcc's libstdc++, and is this out of user's control > > -- so one can't use posix_memalign here. > > Couldn't you use a non-default allocator? > (I know it is not as convenient.) That may be very hard to do. Consider this case: template <typename T> class MyClass { std::vector<T> member; }; One would only need to use the special allocator if T==some SSe type, but not otherwise. That can be done via some really awkward template hacking, but isn't very nice. In addition, MyClass may be in a third-party (not compiler, not application) library, over which I have no control. So there definitely are cases where we can't do this. W. ------------------------------------------------------------------------- Wolfgang Bangerth email: bangerth@ices.utexas.edu www: http://www.ices.utexas.edu/~bangerth/
Subject: Re: malloc does not align memory correctly for sse capable systems > One idea to fix it is to use special handling aligned() for C++ new. Yes, but the C++ maintainers have already said that they don't want to do that. As an additional problem, one can overload operator new in an application program, so you force writers of such overloads to reinvent the same kludge again each time they write such functions. Think of cases like template <typename T> class MyClass { void * operator new (const size_t sz) { // what to do here? we need to figure out the alignment requirements // of T, but there is no standard conforming way to do this } }; Note also that in C++ you can only overload based on types, not on alignment requirements, so figuring out the alignment really has to happen inside above operator function. W. ------------------------------------------------------------------------- Wolfgang Bangerth email: bangerth@ices.utexas.edu www: http://www.ices.utexas.edu/~bangerth/
MALLOC_ALIGNMENT in glibc is really not going to change, it is a quality of implementation, sure, by making it bigger the quality implementation would drop a lot for most of the programs out there. SSE types are certainly out of the scope of the C standard. GCC allows to create objects with arbitrary alignment, not just __m128, but you can use say: __attribute__((aligned (256))). With the same argumentation, you could request that all malloc memory is 256 bytes aligned (or 4K or whatever you choose). If you want to make C++ new working on these types, the compiler will simply need to do its part and call some (non-standard) new operator with additional alignment argument, which would in turn call posix_memalign, perhaps guarded with some compiler option which will otherwise result in a compile time error if new is used on types with too big alignment requirement.
Subject: Re: malloc does not align memory correctly for sse capable systems > MALLOC_ALIGNMENT in glibc is really not going to change, it is a > quality of implementation, sure, by making it bigger the quality > implementation would drop a lot for most of the programs out there. Is it possible to quantify this somehow? > SSE types are certainly out of the scope of the C standard. GCC allows to > create objects with arbitrary alignment, not just __m128, but you can use > say: __attribute__((aligned (256))). With the same argumentation, you > could request that all malloc memory is 256 bytes aligned (or 4K or > whatever you choose). That's a reasonable argument indeed. In fact, we would presently simply get code that ignores this alignment requirement, we only noticed for SSE types since the processor traps when this happens. > If you want to make C++ new working on these types, OK, I will go back to the gcc guys with this. Thanks Wolfgang ------------------------------------------------------------------------- Wolfgang Bangerth email: bangerth@ices.utexas.edu www: http://www.ices.utexas.edu/~bangerth/
This was actually fixed as bug 21120. *** This bug has been marked as a duplicate of bug 21120 ***