This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- From: Andy Lutomirski <luto at amacapital dot net>
- To: Linus Torvalds <torvalds at linux-foundation dot org>
- Cc: Ben Maurer <bmaurer at fb dot com>, Ingo Molnar <mingo at redhat dot com>, libc-alpha <libc-alpha at sourceware dot org>, Andrew Morton <akpm at linux-foundation dot org>, linux-api <linux-api at vger dot kernel dot org>, OndÅej BÃlka <neleai at seznam dot cz>, rostedt <rostedt at goodmis dot org>, Mathieu Desnoyers <mathieu dot desnoyers at efficios dot com>, "Paul E. McKenney" <paulmck at linux dot vnet dot ibm dot com>, Florian Weimer <fweimer at redhat dot com>, Josh Triplett <josh at joshtriplett dot org>, Lai Jiangshan <laijs at cn dot fujitsu dot com>, Paul Turner <pjt at google dot com>, Andrew Hunter <ahh at google dot com>, Peter Zijlstra <peterz at infradead dot org>
- Date: Mon, 20 Jul 2015 14:09:14 -0700
- Subject: Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- Authentication-results: sourceware.org; auth=none
- References: <1436724386-30909-1-git-send-email-mathieu dot desnoyers at efficios dot com> <5CDDBDF2D36D9F43B9F5E99003F6A0D48D5F39C6 at PRN-MBX02-1 dot TheFacebook dot com> <587954201 dot 31 dot 1436808992876 dot JavaMail dot zimbra at efficios dot com> <5CDDBDF2D36D9F43B9F5E99003F6A0D48D5F5DA0 at PRN-MBX02-1 dot TheFacebook dot com> <549319255 dot 383 dot 1437070088597 dot JavaMail dot zimbra at efficios dot com> <CALCETrWEKE=mow3vVh7C4r8CuGy_d5VOEz7KkpijuR5cpBfFtg at mail dot gmail dot com> <CA+55aFz-VBnEKh0SPKgu8xV5=Zb+=6odybVUDoOYOknshbcFJA at mail dot gmail dot com> <20150717232836 dot GA13604 at domone> <CALCETrVY=kjeA_4pazy3BL+ekfcV6WHKw8e3z-LBxx_uP1bw2Q at mail dot gmail dot com> <55ACB2DC dot 5010503 at redhat dot com> <CALCETrV9Vp5UUOb3e_R5tphyE-urBgTwQR2pFWUOOFnHqWXHKQ at mail dot gmail dot com> <55AD14A4 dot 6030101 at redhat dot com> <CALCETrUx6wFxmz+9TyW5bNgaMN0q180G8y9YOyq_D41sdhFaRQ at mail dot gmail dot com> <CA+55aFzMJkzydXb7uVv1iSUnp=539d43ghQaonGdzMoF7QLZBA at mail dot gmail dot com>
On Mon, Jul 20, 2015 at 1:50 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Jul 20, 2015 10:41 AM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>>
>> glibc will have to expose a way to turn it off, I guess. (ELF flag?)
>
> Ugh. That just sounds nasty.
>
> I'm on mobile, so can't check right now, but don't we already have a per-cpu
> gdt? We could just make a very simple rule:
>
> - create a single gdt entry with a segment that is per-cpu and points to one
> single read-only page in kernel space that contains the virtual address of
> that segment in vmalloc space (and maybe we can have the CPU number there
> somewhere, and extend it to something else later)
Annoying problem one: the segment base field is only 32 bits in the GDT.
>
> - make the rule be that if you hold that segment in %fs or %gs in your user
> space state, it gets cleared when the thread is scheduled out.
That sounds a bit evil, but okay.
>
> What does this get you?
>
> It basically means that:
>
> - user space can just load the segment selector in %gs
>
IIRC this is very expensive -- 40 cycles or so. At this point
userspace might as well just use a real lock cmpxchg.
> - user space can load the virtual address of the segment base into a bar
> register, and use that to calculate a pointer to regular data structures.
>
> - user space can use that "reverse offset" to access any data it wants, and
> access that data with a gs override.
>
> - if the user space thread is scheduled, that access will fault with a GP
> fault, because %gs became NUL.
Cute.
>
> So basically you can do any memory access you want, and you'll be guaranteed
> that it will be done "atomically" on the same CPU you did the segment load
> on, or it will fault because you got scheduled away.
>
> And it's very cheap for both kernel and user space. One extra gdt entry (not
> per process or anything like that - it's system global, although different
> cpus all end up with different entries), and for each cpu one virtually
> mapped page. And all user space needs to do is to do a segment load.
>
> No system calls, no nothing.
>
> Would that be useful?
>
Does it solve the Wine problem? If Wine uses gs for something and
calls a function that does this, Wine still goes boom, right?
Could Wine just save and restore gs on calls into and out of Windows
code? That would solve all the problems, right?
--Andy