This is the mail archive of the
mailing list for the glibc project.
Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Mathieu Desnoyers <mathieu dot desnoyers at efficios dot com>
- Cc: Ben Maurer <bmaurer at fb dot com>, Paul Turner <pjt at google dot com>, Andrew Hunter <ahh at google dot com>, Peter Zijlstra <peterz at infradead dot org>, Ingo Molnar <mingo at redhat dot com>, rostedt <rostedt at goodmis dot org>, "Paul E. McKenney" <paulmck at linux dot vnet dot ibm dot com>, Josh Triplett <josh at joshtriplett dot org>, Lai Jiangshan <laijs at cn dot fujitsu dot com>, Linus Torvalds <torvalds at linux-foundation dot org>, Andrew Morton <akpm at linux-foundation dot org>, linux-api <linux-api at vger dot kernel dot org>, libc-alpha at sourceware dot org
- Date: Fri, 17 Jul 2015 12:58:59 +0200
- Subject: Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- Authentication-results: sourceware.org; auth=none
- References: <1436724386-30909-1-git-send-email-mathieu dot desnoyers at efficios dot com> <5CDDBDF2D36D9F43B9F5E99003F6A0D48D5F39C6 at PRN-MBX02-1 dot TheFacebook dot com> <587954201 dot 31 dot 1436808992876 dot JavaMail dot zimbra at efficios dot com>
On Mon, Jul 13, 2015 at 05:36:32PM +0000, Mathieu Desnoyers wrote:
> ----- On Jul 13, 2015, at 7:17 AM, Ben Maurer email@example.com wrote:
> > At Facebook we already use getcpu in folly, our base C++ library, to provide
> > high performance concurrency algorithms. Folly includes an abstraction called
> > AccessSpreader which helps engineers write abstractions which shard themselves
> > across different cores to prevent cache contention
> > (https://github.com/facebook/folly/blob/master/folly/detail/CacheLocality.cpp).
Could you contribute your improvements/tips to libc? If these help for
c++ mutex then it would also improve c mutex.
> > We have used this primative to create faster reader writer locks
> > (https://github.com/facebook/folly/blob/master/folly/SharedMutex.h), as well as
> > in an abstraction that powers workqueues
> > (https://github.com/facebook/folly/blob/master/folly/IndexedMemPool.h). This
> > would be a great perf improvement for these types of abstractions and probably
> > encourage us to use the idea more widely.
As libc rwlocks now are slow it gets speedup from that. Main problem
with this is that lock elission will give you bigger speedups that that.
Also from description you have wrong rwlock usecase, main application is
avoid blocking, when two readers take lock for long time having one wait
would be terrible.
> > One quick comment on the approach -- it'd be really great if we had a method
> > that didn't require users to register each thread. This can often lead to
> > requiring an additional branch in critical code to check if the appropriate
> > caches have been initialized. Also, one of the most interesting potential
> > applications of the restartable sequences concept is in malloc. having a brief
> > period at the beginning of the life of a thread where malloc didn't work would
> > be pretty tricky to program around.
> If we invoke this per-thread registration directly in the glibc NPTL implementation,
> in start_thread, do you think it would fit your requirements ?
A generic solution would be adding eager initialization of thread_local
variables which would fix more performance problems.
Second would be write patch to libc adding function
pthread_create_add_hook_np to register function that would be ran after
each thread cretion.