This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- From: Mathieu Desnoyers <mathieu dot desnoyers at efficios dot com>
- To: Ben Maurer <bmaurer at fb dot com>
- Cc: Paul Turner <pjt at google dot com>, Andrew Hunter <ahh at google dot com>, Peter Zijlstra <peterz at infradead dot org>, Ingo Molnar <mingo at redhat dot com>, rostedt <rostedt at goodmis dot org>, "Paul E. McKenney" <paulmck at linux dot vnet dot ibm dot com>, Josh Triplett <josh at joshtriplett dot org>, Lai Jiangshan <laijs at cn dot fujitsu dot com>, Linus Torvalds <torvalds at linux-foundation dot org>, Andrew Morton <akpm at linux-foundation dot org>, linux-api <linux-api at vger dot kernel dot org>, libc-alpha at sourceware dot org
- Date: Mon, 13 Jul 2015 17:36:32 +0000 (UTC)
- Subject: Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)
- Authentication-results: sourceware.org; auth=none
- References: <1436724386-30909-1-git-send-email-mathieu dot desnoyers at efficios dot com> <5CDDBDF2D36D9F43B9F5E99003F6A0D48D5F39C6 at PRN-MBX02-1 dot TheFacebook dot com>
----- On Jul 13, 2015, at 7:17 AM, Ben Maurer bmaurer@fb.com wrote:
> At Facebook we already use getcpu in folly, our base C++ library, to provide
> high performance concurrency algorithms. Folly includes an abstraction called
> AccessSpreader which helps engineers write abstractions which shard themselves
> across different cores to prevent cache contention
> (https://github.com/facebook/folly/blob/master/folly/detail/CacheLocality.cpp).
> We have used this primative to create faster reader writer locks
> (https://github.com/facebook/folly/blob/master/folly/SharedMutex.h), as well as
> in an abstraction that powers workqueues
> (https://github.com/facebook/folly/blob/master/folly/IndexedMemPool.h). This
> would be a great perf improvement for these types of abstractions and probably
> encourage us to use the idea more widely.
>
> One quick comment on the approach -- it'd be really great if we had a method
> that didn't require users to register each thread. This can often lead to
> requiring an additional branch in critical code to check if the appropriate
> caches have been initialized. Also, one of the most interesting potential
> applications of the restartable sequences concept is in malloc. having a brief
> period at the beginning of the life of a thread where malloc didn't work would
> be pretty tricky to program around.
If we invoke this per-thread registration directly in the glibc NPTL implementation,
in start_thread, do you think it would fit your requirements ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com