This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] nptl: change default stack guard size of threads
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: Rich Felker <dalias at libc dot org>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, "GNU C Library" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, Jeff Law <law at redhat dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Date: Tue, 5 Dec 2017 10:55:31 +0000
- Subject: Re: [RFC] nptl: change default stack guard size of threads
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <5A1ECB40.email@example.com> <firstname.lastname@example.org> <5A1EFF28.email@example.com> <firstname.lastname@example.org> <20171129205148.GG1627@brightrain.aerifal.cx> <email@example.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Wed, Nov 29, 2017 at 09:02:48PM +0000, Florian Weimer wrote:
> On 11/29/2017 09:51 PM, Rich Felker wrote:
> > I'm not sure I follow, but from the standpoint of virtual address
> > space and what is an acceptable cost in wasted address space, any
> > ILP32-on-64 ABIs should be considered the same as 32-bit archs. As
> > such, I think GCC really needs to do the stack probe every 4k, not
> > 64k, and the default (and certainly minimum-supported) guard size
> > should be kept at 4k, not 64k or anything larger.
> Yes, and I expect that we will keep using 4 KiB probing on i386 (and
> s390/s390x). That's what Jeff gave me for testing. I do hope the final
> upstream version isn't going to be different in this regard.
> But in the end, this is up to the machine maintainers (for gcc and glibc).
> >> We can throw new code at this problem and solve it for 64-bit. For
> >> 32-bit, we simply do not have a universally applicable solution. My
> >> understanding was that everywhere except on ARM, GCC was compatible
> >> with the pioneering glibc/Linux work in this area (the guard page we
> >> added to thread stacks, and the guard page added by the kernel). If
> >> this isn't the case, then I'm really disappointed in the disregard
> >> of existing practice on the GCC side.
> > Hm? What are you thinking of that GCC might have gotten wrong?
> Use 64 KiB probe intervals (almost) everywhere as an optimization. I
> assumed the original RFC patch was motivated by that.
> I knew that ARM would be broken because that's what the gcc ARM
> maintainers want. I assumed that it was restricted to that, but now I'm
> worried that it's not.
To be clear here, I'm coming in to the review of the probing support in GCC
late, and with very little context on the design of the feature. I certainly
wouldn't want to cause you worry - I've got no intention of pushing for
optimization to a larger guard page size if it would leaves things broken
Likewise, I have no real desire for us to emit a bunch of extra operations
if we're not required to for glibc.
If I'm reopening earlier conversations, it is only because I wasn't involved
in them. I have no interest in us doing something "Very Wrong Indeed".
If assuming that 64k probes are sufficient on AArch64 is not going to allow
us a correct implementation, then we can't assume 64k probes on AArch64. My
understanding was that we were safe in this as the kernel was giving us a
generous 1MB to play with, and we could modify glibc to also give us 64k
(I admit, I had not considered ILP32, where you've rightly pointed out we
will eat lots of address space if we make this decision).
> > GCC needs to emit probe intervals for the smallest supported page size
> > on the the target architecture. If it does not do that, we end up in
> > trouble on the glibc side.
This is where I may have a misunderstanding, why would it require probing
at the smallest page size, rather than probing at a multiple of the guard
size? It is very likely I'm missing something here as I don't know the glibc
side of this at all.
Thanks for your advice so far. To reiterate, I'm not pushing any particular
optimization agenda in GCC, but I would like to understand the trade-off