This is the mail archive of the
mailing list for the glibc project.
Re: New optimized string routines for Intel and alignment of stack.
- From: Florian Weimer <fweimer at redhat dot com>
- To: Zack Weinberg <zackw at panix dot com>
- Cc: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 7 Jun 2016 14:40:52 +0200
- Subject: Re: New optimized string routines for Intel and alignment of stack.
- Authentication-results: sourceware.org; auth=none
- References: <57566200 dot 2040203 at redhat dot com> <dea8c68f-cc02-9427-4e54-acd795a930cf at redhat dot com> <5756B542 dot 4060608 at linaro dot org> <ba80400c-053a-977f-4524-f5817cc17fab at redhat dot com> <CAKCAbMgG4zo_rN-bVgQ0tzYdGKM4q7xJOzcxM8KXbXYtV77PXA at mail dot gmail dot com>
On 06/07/2016 02:25 PM, Zack Weinberg wrote:
On Tue, Jun 7, 2016 at 8:23 AM, Florian Weimer <firstname.lastname@example.org> wrote:
On 06/07/2016 01:51 PM, Adhemerval Zanella wrote:
Also, is there any performance issue with current unaligned version
I think there is a performance penalty from not using vectorized copies for
small structs. Even unaligned SSE loads/stores would be a win for the
example I posted, I assume.
Is the gain from using vectorized copies large enough that manually
aligning the stack in functions that want to use those instructions
would be worth it?
Probably not, using unaligned loads/stores would likely be cheaper than
stack alignment (unless the function already has a frame pointer for