This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- From: Linus Torvalds <torvalds at linux-foundation dot org>
- To: Andy Lutomirski <luto at amacapital dot net>
- Cc: Mathieu Desnoyers <mathieu dot desnoyers at efficios dot com>, Ben Maurer <bmaurer at fb dot com>, Paul Turner <pjt at google dot com>, Andrew Hunter <ahh at google dot com>, Peter Zijlstra <peterz at infradead dot org>, Ingo Molnar <mingo at redhat dot com>, rostedt <rostedt at goodmis dot org>, "Paul E. McKenney" <paulmck at linux dot vnet dot ibm dot com>, Josh Triplett <josh at joshtriplett dot org>, Lai Jiangshan <laijs at cn dot fujitsu dot com>, Andrew Morton <akpm at linux-foundation dot org>, linux-api <linux-api at vger dot kernel dot org>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Fri, 17 Jul 2015 11:48:14 -0700
- Subject: Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- Authentication-results: sourceware.org; auth=none
- References: <1436724386-30909-1-git-send-email-mathieu dot desnoyers at efficios dot com> <5CDDBDF2D36D9F43B9F5E99003F6A0D48D5F39C6 at PRN-MBX02-1 dot TheFacebook dot com> <587954201 dot 31 dot 1436808992876 dot JavaMail dot zimbra at efficios dot com> <5CDDBDF2D36D9F43B9F5E99003F6A0D48D5F5DA0 at PRN-MBX02-1 dot TheFacebook dot com> <549319255 dot 383 dot 1437070088597 dot JavaMail dot zimbra at efficios dot com> <CALCETrWEKE=mow3vVh7C4r8CuGy_d5VOEz7KkpijuR5cpBfFtg at mail dot gmail dot com>
On Thu, Jul 16, 2015 at 12:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> If we actually bit the bullet and implemented per-cpu mappings
That's not ever going to happen.
Per-cpu page tables are a complete disaster. It's a recipe for crazy
race conditions, when you have CPUs that update things like
dirty/accessed bits atomically etc, and you have fundamental races
when multiple CPU's allocating page tables at the same time (remember:
we have concurrent page faults, and the locking is not per-vm, it's at
a finer granularity).
It's also a big memory management problem when you have lots and lots of CPU's.
So don't go there. The only way to do per-cpu virtual mappings is
hardware-specific, it if you have hardware that explicitly allows
inserting percpu TLB entries (while still sharing the page tables),
then that would be ok. And we don't have that on x86. MIPS has
explicit support for these kinds of TLB backs, and obviously on other
architectures you might be able to play games with the SW-fill TLB,
but on x86 there's no hardware support for per-CPU TLB filling.
And this is not just theory. We've seen what happens when people try
to do per-thread page tables. It's happened several times, and it's a
fundamental mistake. Plan-9 had "private mappings" because that's how
they did stacks (ie the stack mappings were thread-local), and it
means that thread switching is fundamentally broken. I think Mach did
too. And per-cpu page tables are less broken from a scheduling
standpoint than per-thread page tables, but still do share a lot of
the synchronization problems, and have some allocation issues all
their own.
The Linux VM model of "one page table per VM" is the right one.
Anything else sucks, and makes threading a disaster.
So you can try to prove me wrong, but seriously, I doubt you'll succeed.
On x86, if you want per-cpu memory areas, you should basically plan on
using segment registers instead (although other odd state has been
used - there's been the people who use segment limits etc rather than
the *pointer* itself, preferring to use "lsl" to get percpu data. You
could also imaging hiding things in the vector state somewhere if you
control your environment well enough).
Linus