This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH roland/arm] ARM: Define PI_STATIC_AND_HIDDEN.
- From: Roland McGrath <roland at hack dot frob dot com>
- To: Chris Metcalf <cmetcalf at ezchip dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Wed, 15 Apr 2015 14:11:38 -0700 (PDT)
- Subject: Re: [PATCH roland/arm] ARM: Define PI_STATIC_AND_HIDDEN.
- Authentication-results: sourceware.org; auth=none
- References: <20150414223916 dot 9B4782C3BDC at topped-with-meat dot com> <552EB1FB dot 8010802 at ezchip dot com>
> Did you do any benchmarking? I enabled this on tilegx out of curiosity,
> and I found it was both larger (ld.so size increased 0.3%) and slower
> (average of 0.1% slower when running fork/exec of a no-op program
> in a loop with 10,000 iterations). The timing results do vary enough
> that I'd want to do more extensive testing to be able to say anything
> definitive, but it's not very encouraging.
I did not. Just now I've looked at the sizes on arm-linux-gnueabihf
(-mthumb), and indeed this increases ld.so's text by 256 bytes (to
93989 from 94245). For -marm the difference is only 192 bytes (to
126581 from 126389).
> I admit I'm not sure why this might be, but it seems like the right
> things were happening, e.g. _dl_start_final is missing and _dl_start
> is bigger in the version with PI_STATIC_AND_HIDDEN defined.
My guess is that there is some extra arithmetic to compute runtime
addresses from PC-relative ones. Conversely, without
PI_STATIC_AND_HIDDEN defined the equivalent accesses are SP-relative
(and thus simpler) while the extra code to copy fields from the stack
struct to the hidden global struct is less than that increase.
But I didn't actually compare the code even for ARM, let alone Tile.
> All that said, if it's the more standard way, or is desirable for some
> other reason, I'm happy to enable it for tilegx, but...
It's certainly the more standard way in the sense that most machines,
including the most-used machines (x86), do it that way. The recent
MIPS bug is an indication that the stack bootstrap_map code path is
less tested and perhaps less generally reliable. I'd say that the
PI_STATIC_AND_HIDDEN case is just more straightforward and clean, so
if we can eventually get every machine into that state, that would be
ideal.
For x86_64, PC-relative accesses are entirely free. (Well, you forego
indexed addressing modes you might sometimes use with absolute or
SP-relative accesses, but there is no PC-relative overhead per se.)
Not surprisingly, turning PI_STATIC_AND_HIDDEN off there adds 256
bytes to ld.so. (Actually it is slightly surprising that it's exactly
+256 while ARM/Thumb is exactly -256. :-) It's just entirely expected
that it's a loser on x86_64.)
For i386, PC-relative accesses are relatively costly. So it's a mild
surprise that turning PI_STATIC_AND_HIDDEN off there adds 188 bytes
(i.e. fewer PC-relative accesses needs more code, perhaps mostly the
extra copy-from-stack code). My guess is that the compiler backend
for i386 has had more effort put into optimizing these cases. For
example, perhaps it computes the common base address only once in the
function and reuses it more effectively while the ARM and TIle
backends are repeating the computation more often. Given that i386
has more register pressure than any other machine, the fact that they
can win here suggests that you could too with sufficient work on the
compiler side. (But this is all just guesses.)
(All these examples were with GCC 4.8.2 as modified by Ubuntu.
For arm-nacl with GCC 4.9.2 as modified by me, it adds 512 bytes.)
Thanks,
Roland