Andi Kleen [Thu, 27 Jun 2013 18:15:06 +0000 (11:15 -0700)]
Disable elision for any pthread_mutexattr_settype call
PTHREAD_MUTEX_NORMAL requires deadlock for nesting, DEFAULT
does not. Since glibc uses the same value (0) disable elision
for any call to pthread_mutexattr_settype() with a 0 value.
This implies that a program can disable elision by doing
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_NORMAL)
Andi Kleen [Sat, 22 Dec 2012 09:03:04 +0000 (01:03 -0800)]
Add elision to pthread_mutex_{try,timed,un}lock
Add elision paths to the basic mutex locks.
The normal path has a check for RTM and upgrades the lock
to RTM when available. Trylocks cannot automatically upgrade,
so they check for elision every time.
We use a 4 byte value in the mutex to store the lock
elision adaptation state. This is separate from the adaptive
spin state and uses a separate field.
Condition variables currently do not support elision.
Recursive mutexes and condition variables may be supported at some point,
but are not in the current implementation. Also "trylock" will
not automatically enable elision unless some other lock call
has been already called on the lock.
This version does not use IFUNC, so it means every lock has one
additional check for elision. Benchmarking showed the overhead
to be negligible.
Andi Kleen [Fri, 28 Jun 2013 12:19:37 +0000 (05:19 -0700)]
Add minimal test suite changes for elision enabled kernels
tst-mutex5 and 8 test some behaviour not required by POSIX,
that elision changes. This changes these tests to not check
this when elision is enabled at configure time.
Andi Kleen [Sat, 10 Nov 2012 08:51:26 +0000 (00:51 -0800)]
Add the low level infrastructure for pthreads lock elision with TSX
Lock elision using TSX is a technique to optimize lock scaling
It allows to run locks in parallel using hardware support for
a transactional execution mode in 4th generation Intel Core CPUs.
See http://www.intel.com/software/tsx for more Information.
This patch implements a simple adaptive lock elision algorithm based
on RTM. It enables elision for the pthread mutexes and rwlocks.
The algorithm keeps track whether a mutex successfully elides or not,
and stops eliding for some time when it is not.
When the CPU supports RTM the elision path is automatically tried,
otherwise any elision is disabled.
The adaptation algorithm and its tuning is currently preliminary.
The code adds some checks to the lock fast paths. Micro-benchmarks
show little to no difference without RTM.
This patch implements the low level "lll_" code for lock elision.
Followon patches hook this into the pthread implementation
Changes with the RTM mutexes:
-----------------------------
Lock elision in pthreads is generally compatible with existing programs.
There are some obscure exceptions, which are expected to be uncommon.
See the manual for more details.
- A broken program that unlocks a free lock will crash.
There are ways around this with some tradeoffs (more code in hot paths)
I'm still undecided on what approach to take here; have to wait for testing reports.
- pthread_mutex_destroy of a lock mutex will not return EBUSY but 0.
- There's also a similar situation with trylock outside the mutex,
"knowing" that the mutex must be held due to some other condition.
In this case an assert failure cannot be recovered. This situation is
usually an existing bug in the program.
- Same applies to the rwlocks. Some of the return values changes
(for example there is no EDEADLK for an elided lock, unless it aborts.
However when elided it will also never deadlock of course)
- Timing changes, so broken programs that make assumptions about specific timing
may expose already existing latent problems. Note that these broken programs will
break in other situations too (loaded system, new faster hardware, compiler
optimizations etc.)
- Programs with non recursive mutexes that take them recursively in a thread and
which would always deadlock without elision may not always see a deadlock.
The deadlock will only happen on an early or delayed abort (which typically
happens at some point)
This only happens for mutexes not explicitely set to PTHREAD_MUTEX_NORMAL
or PTHREAD_MUTEX_ADAPTIVE_NP. PTHREAD_MUTEX_NORMAL mutexes do not elide.
The elision default can be set at configure time.
This patch implements the basic infrastructure for elision.
[BZ #15022] Correct global-scope dlopen issues in static executables.
This change creates a link map in static executables to serve as the
global search list for dlopen. It fixes a problem with the inability
to access the global symbol object and a crash on an attempt to map a
DSO into the global scope. Some code that has become dead after the
addition of this link map is removed too and test cases are provided.
This function is now called from dl_open_worker with the GL(dl_load_lock)
lock held and no longer needs local protection. GL(dl_load_lock) also
correctly protects _dl_lookup_symbol_x called here that relies on the
caller to have serialized access to the data structures it uses.
Static applications that call pthread_exit on the main
thread segfault. This is because after a thread terminates
__libc_start_main decrements __nptl_nthreads which is only
defined in pthread_create. Therefore the right solution is
to add a requirement to pthread_create from pthread_exit.
~~~
nptl/
2013-06-24 Vladimir Nikulichev <v.nikulichev@gmail.com>
[BZ #12310]
* pthread_exit.c: Add reference to pthread_create.
Check wheter the compiler has the option -fno-tree-loop-distribute-patterns
to inhibit loop transformation to library calls and uses it on memset
and memmove default implementation to avoid recursive calls.
This patch introduces two new convenience functions to set the default
thread attributes used for creating threads. This allows a programmer
to set the default thread attributes just once in a process and then
run pthread_create without additional attributes.
Kirk Meyer [Fri, 14 Jun 2013 00:11:02 +0000 (10:11 +1000)]
MicroBlaze: negated errors in lowlevellock.h
The macros in lowlevellock.h are returning positive errors, but the
users of the macros expect negative. This causes e.g. sem_wait to
sometimes return an error with errno set to -EWOULDBLOCK.
Signed-off-by: Kirk Meyer <kirk.meyer@sencore.com> Signed-off-by: David Holsgrove <david.holsgrove@xilinx.com>
Avoid access beyond memory bounds in pthread_attr_getaffinity_np
Resolves BZ #15618.
pthread_attr_getaffinity_np may write beyond bounds of the input
cpuset buffer if the size of the input buffer is smaller than the
buffer present in the input pthread attributes. Fix is to copy to the
extent of the minimum of the source and the destination.
Chris Metcalf [Wed, 12 Jun 2013 20:48:33 +0000 (16:48 -0400)]
tile: default to little-endian in bits/endian.h
This turns out to be helpful when doing a from-scratch cross-compile of
gcc and glibc, since you can then do "make install-headers" in glibc
even before you have a functioning tile gcc.
Johan Heikkila [Thu, 13 Jun 2013 07:49:03 +0000 (09:49 +0200)]
Update sv_FI
[BZ#15431]
* locales/sv_FI: Add LC_MEASUREMENT, use copy in LC_TELEPHONE,
update LC_ADDRESS with using postal_fmt from Finnish Post Office
recommendations at
http://www.posti.fi/hinnatjaohjeet/osoitejakuorimerkinnat/osoitemerkinnat.html
and add missing entries.
GCC 4.8 enables -ftree-loop-distribute-patterns at -O3 by default and
this optimization may transform loops into memset/memmove calls. Without
proper handling this may generate unexpected PLT calls on GLIBC.
This patch fixes by create memset/memmove alias to internal GLIBC
__GI_memset/__GI_memmove symbols.
The most common use case of math functions is with default rounding
mode, i.e. rounding to nearest. Setting and restoring rounding mode
is an unnecessary overhead for this, so I've added support for a
context, which does the set/restore only if the FP status needs a
change. The code is written such that only x86 uses these. Other
architectures should be unaffected by it, but would definitely benefit
if the set/restore has as much overhead relative to the rest of the
code, as the x86 bits do.
Here's a summary of the performance improvement due to these
improvements; I've only mentioned functions that use the set/restore
and have benchmark inputs for x86_64:
Begin porting string performance tests to benchtests
This is the initial support for string function performance tests,
along with copying tests for memcpy and memcpy-ifunc as proof of
concept. The string function benchmarks perform operations at
different alignments and for different sizes and compare performance
between plain operations and the optimized string operations. Due to
this their output is incompatible with the function benchmarks where
we're interested in fastest time, throughput, etc.
In future, the correctness checks in the benchmark tests can be
removed. Same goes for the performance measurements in the
string/test-*.