This is the mail archive of the
mailing list for the glibc project.
Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- From: Andy Lutomirski <luto at amacapital dot net>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: Linus Torvalds <torvalds at linux-foundation dot org>, Mathieu Desnoyers <mathieu dot desnoyers at efficios dot com>, Ben Maurer <bmaurer at fb dot com>, Paul Turner <pjt at google dot com>, Andrew Hunter <ahh at google dot com>, Peter Zijlstra <peterz at infradead dot org>, Ingo Molnar <mingo at redhat dot com>, rostedt <rostedt at goodmis dot org>, "Paul E. McKenney" <paulmck at linux dot vnet dot ibm dot com>, Josh Triplett <josh at joshtriplett dot org>, Lai Jiangshan <laijs at cn dot fujitsu dot com>, Andrew Morton <akpm at linux-foundation dot org>, linux-api <linux-api at vger dot kernel dot org>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Fri, 17 Jul 2015 16:33:42 -0700
- Subject: Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- Authentication-results: sourceware.org; auth=none
- References: <1436724386-30909-1-git-send-email-mathieu dot desnoyers at efficios dot com> <5CDDBDF2D36D9F43B9F5E99003F6A0D48D5F39C6 at PRN-MBX02-1 dot TheFacebook dot com> <587954201 dot 31 dot 1436808992876 dot JavaMail dot zimbra at efficios dot com> <5CDDBDF2D36D9F43B9F5E99003F6A0D48D5F5DA0 at PRN-MBX02-1 dot TheFacebook dot com> <549319255 dot 383 dot 1437070088597 dot JavaMail dot zimbra at efficios dot com> <CALCETrWEKE=mow3vVh7C4r8CuGy_d5VOEz7KkpijuR5cpBfFtg at mail dot gmail dot com> <CA+55aFz-VBnEKh0SPKgu8xV5=Zb+=6odybVUDoOYOknshbcFJA at mail dot gmail dot com> <20150717232836 dot GA13604 at domone>
On Fri, Jul 17, 2015 at 4:28 PM, OndÅej BÃlka <email@example.com> wrote:
> On Fri, Jul 17, 2015 at 11:48:14AM -0700, Linus Torvalds wrote:
>> On x86, if you want per-cpu memory areas, you should basically plan on
>> using segment registers instead (although other odd state has been
>> used - there's been the people who use segment limits etc rather than
>> the *pointer* itself, preferring to use "lsl" to get percpu data. You
>> could also imaging hiding things in the vector state somewhere if you
>> control your environment well enough).
> Thats correct, problem is that you need some sort of hack like this on
> archs that otherwise would need syscall to get tid/access tls variable.
> On x64 and archs that have register for tls this could be implemented
> relatively easily.
> Kernel needs to allocate
> int running_cpu_for_tid;
> On context switch it atomically writes to this table
> running_cpu_for_tid[tid] = cpu;
> This table is read-only accessible from userspace as mmaped file.
> Then userspace just needs to access it with three indirections like:
> __thread tid;
> char caches[CPU_MAX];
> #define getcpu_cache caches[tid > 32768 ? get_cpu() : running_cpu_for_tid[tid]]
> With more complicated kernel interface you could eliminate one
> indirection as we would use void * array instead and thread could do
> syscall to register what values it should use for each thread.
Or we implement per-cpu segment registers so you can point gs directly
at percpu data. This is conceptually easy and has no weird ABI
issues. All it needs is an implementation and some good tests.
I think the API should be "set gsbase to x + y*(cpu number)". On
x86_64, userspace just allocates a big swath of virtual space and
populates it as needed.