Carlos O'Donell [Tue, 14 May 2013 04:06:35 +0000 (00:06 -0400)]
Add comments to vDSO hwcap loading process.
Loading of the vDSO pseudo-hwcap from the type 2 GNU note is
a rather arcane and poorly documented process. Given that I had
a chance to review this code today I thought I would add all
of the things I had to lookup to verify the validity of the
process.
With a single .note.GNU the vDSO can register up to 64 flags,
though in practice you are limited to 64 - _DL_FIRST_EXTRA
bits which on x86 is 12 bits.
The only use of this that I know of is in the Xen support
in Linux where they use the 1st bit to indicate "nosegneg".
I see "We use bit 1 to avoid bugs in some versions of glibc
when bit 0 is used; the choice is otherwise arbitrary.", but
no reference to a glibc bug anywhere. The code as-is should
support bit zero, so we still have that free for future use.
The kernel, glibc, and ld.so.cache must coordinate to ensure
that bit values don't go too high and are used consistently.
HP_TIMING uses native timestamping instructions if available, thus
greatly reducing the overhead of recording start and end times for
function calls. For architectures that don't have HP_TIMING
available, we fall back to the clock_gettime bits. One may also
override this by invoking the benchmark as follows:
make USE_CLOCK_GETTIME=1 bench
and get the benchmark results using clock_gettime. One has to do
`make bench-clean` to ensure that the benchmark programs are rebuilt.
Carlos O'Donell [Thu, 9 May 2013 21:37:15 +0000 (17:37 -0400)]
Add more comments to dlclose() algorithm.
The algorithm for scanning dependencies upon dlclose is
less than immediately obvious. This patch adds two bits
of comments that explain why you start the dependency
search at l_initfini[1], and why you need to restart
the search.
PowerPC kernel now provides a vDSO implementation for time syscall
(commit fcb41a2030abe0eb716ef0798035ef9562097f42). This patch changes
time syscall wrapper to use the vDSO when available. It also changes
the default non vDSO time on PowerPC to use sysdeps/posix/time.c
(since gettimeofday is a vDSO call).
Andreas Jaeger [Fri, 3 May 2013 18:51:27 +0000 (20:51 +0200)]
Sync with Linux 3.9
* sysdeps/gnu/netinet/tcp.h (TCP_TIMESTAMP): New value, from
Linux 3.9.
* sysdeps/unix/sysv/linux/bits/socket.h (PF_VSOCK, AF_VSOCK):
Add.
(PF_MAX): Adjust for VSOCK change.
Carlos O'Donell [Fri, 3 May 2013 03:24:21 +0000 (23:24 -0400)]
Add yesstr and nostr to en_CA, es_AR, and es_ES
We add yesstr and nostr to three more locales.
We ignore the issue of capitalization of the first
character in yesstr and nostr. All locales will need
to be revisited to make this uniform policy change.
Richard Smith [Wed, 1 May 2013 10:32:38 +0000 (20:32 +1000)]
Use __gnu_inline__ for __extern_always_inline in g++-4.2
Use the __gnu_inline__ attribute in _FORTIFY_SOURCE's __extern_always_inline
macro whenever the compiler supports it. Previously this macro only included
the __gnu_inline__ attribute in C++ mode for gcc >= 4.3. However,
__gnu_inline__ semantics are always desired for the __extern_always_inline
functions, and are available in g++ 4.2 (and some releases of g++ 4.1, and
also in Clang, which claims to be g++ 4.2).
This change stops g++-4.2 from emitting weak definitions for the fortify
wrapper functions if they can't be inlined, and also improves Clang
compatibility.
Allow multiple input domains to be run in the same benchmark program
Some math functions have distinct performance characteristics in
specific domains of inputs, where some inputs return via a fast path
while other inputs require multiple precision calculations, that too
at different precision levels. The way to implement different domains
was to have a separate source file and benchmark definition, resulting
in separate programs.
This clutters up the benchmark, so this change allows these domains to
be consolidated into the same input file. To do this, the input file
format is now enhanced to allow comments with a preceding # and
directives with two # at the begining of a line. A directive that
looks like:
tells the benchmark generation script that what follows is a different
domain of inputs. The value of the 'name' directive (in this case,
foo) is used in the output. The two input domains are then executed
sequentially and their results collated separately. with the above
directive, there would be two lines in the result that look like:
The idea to run benchmarks for a constant number of iterations is
problematic. While the benchmarks may run for 10 seconds on x86_64,
they could run for about 30 seconds on powerpc and worse, over 3
minutes on arm. Besides that, adding a new benchmark is cumbersome
since one needs to find out the number of iterations needed for a
sufficient runtime.
A better idea would be to run each benchmark for a specific amount of
time. This patch does just that. The run time defaults to 10 seconds
and it is configurable at command line: