This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH roland/arm] ARM: Define PI_STATIC_AND_HIDDEN.


> Did you do any benchmarking?  I enabled this on tilegx out of curiosity,
> and I found it was both larger (ld.so size increased 0.3%) and slower
> (average of 0.1% slower when running fork/exec of a no-op program
> in a loop with 10,000 iterations).  The timing results do vary enough
> that I'd want to do more extensive testing to be able to say anything
> definitive, but it's not very encouraging.

I did not.  Just now I've looked at the sizes on arm-linux-gnueabihf
(-mthumb), and indeed this increases ld.so's text by 256 bytes (to
93989 from 94245).  For -marm the difference is only 192 bytes (to
126581 from 126389).

> I admit I'm not sure why this might be, but it seems like the right
> things were happening, e.g. _dl_start_final is missing and _dl_start
> is bigger in the version with PI_STATIC_AND_HIDDEN defined.

My guess is that there is some extra arithmetic to compute runtime
addresses from PC-relative ones.  Conversely, without
PI_STATIC_AND_HIDDEN defined the equivalent accesses are SP-relative
(and thus simpler) while the extra code to copy fields from the stack
struct to the hidden global struct is less than that increase.

But I didn't actually compare the code even for ARM, let alone Tile.

> All that said, if it's the more standard way, or is desirable for some
> other reason, I'm happy to enable it for tilegx, but...

It's certainly the more standard way in the sense that most machines,
including the most-used machines (x86), do it that way.  The recent
MIPS bug is an indication that the stack bootstrap_map code path is
less tested and perhaps less generally reliable.  I'd say that the
PI_STATIC_AND_HIDDEN case is just more straightforward and clean, so
if we can eventually get every machine into that state, that would be
ideal.

For x86_64, PC-relative accesses are entirely free.  (Well, you forego
indexed addressing modes you might sometimes use with absolute or
SP-relative accesses, but there is no PC-relative overhead per se.)
Not surprisingly, turning PI_STATIC_AND_HIDDEN off there adds 256
bytes to ld.so.  (Actually it is slightly surprising that it's exactly
+256 while ARM/Thumb is exactly -256. :-) It's just entirely expected
that it's a loser on x86_64.)

For i386, PC-relative accesses are relatively costly.  So it's a mild
surprise that turning PI_STATIC_AND_HIDDEN off there adds 188 bytes
(i.e. fewer PC-relative accesses needs more code, perhaps mostly the
extra copy-from-stack code).  My guess is that the compiler backend
for i386 has had more effort put into optimizing these cases.  For
example, perhaps it computes the common base address only once in the
function and reuses it more effectively while the ARM and TIle
backends are repeating the computation more often.  Given that i386
has more register pressure than any other machine, the fact that they
can win here suggests that you could too with sufficient work on the
compiler side.  (But this is all just guesses.)

(All these examples were with GCC 4.8.2 as modified by Ubuntu.
For arm-nacl with GCC 4.9.2 as modified by me, it adds 512 bytes.)


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]