Gaps between BSS Variables; How are they Placed?
Ralph Corderoy
ralph@inputplus.co.uk
Thu Feb 4 14:38:36 GMT 2021
Hi,
Alan wrote:
> > When building an ELF executable using the BFD linker from a C file
> > through gcc(1) on amd64/x86_64, what area of code takes care of
> > ordering the common symbols in the .bss section? I'd like to
> > understand its algorithm as I'm surprised by the results.
...
> ldlang.c:lang_common.
Thanks Alan, that's what I was after.
So now my understanding is lang_common() lays the symbols out in random
hash-walk order unless ld's --sort-common is used in which case the hash
is walked repeatedly, placing symbols from a set of alignments each time.
If descending is chosen then the hash is walked in five steps with the
alignment sets being: >=2⁴ ==2³ ==2² ==2¹ ==2⁰
7094 for (power = 4; power > 0; power--)
7095 bfd_link_hash_traverse (link_info.hash, lang_one_common, &power);
7096
7097 power = 0;
7098 bfd_link_hash_traverse (link_info.hash, lang_one_common, &power);
This means the first >=2⁴ set aren't sorted but are still in hash-walk
order, and they're the big alignments where the fills will be
significant. I'd say that's a bug compared to the documentation and can
waste a lot of space with big alignments, e.g. for DMA. An extra
initial hash walk could find the largest alignment and work down from
there.
I don't see why the loop's condition isn't ‘power >= 0’ to remove the
need for the last couple of lines? Perhaps just for symmetry with the
ascending version.
If ascending then the alignment sets are: ==2⁰ ==2¹ ==2² ==2³ ==2⁴ >=2⁵
So again large alignments aren't sorted.
7102 for (power = 0; power <= 4; power++)
7103 bfd_link_hash_traverse (link_info.hash, lang_one_common, &power);
7104
7105 power = (unsigned int) -1;
7106 bfd_link_hash_traverse (link_info.hash, lang_one_common, &power);
Separate from the ‘large alignments are unsorted’ issue above, the
method assumes something with an alignment requirement of 2ⁿ is at least
that large in size but that not need be the case with a DMA buffer on an
embedded system where SRAM is tight and it's still useful to DMA
20 bytes to a UART with 32-byte alignment.
Is that the cause for not seeing ‘*fill*’s in the map file? These
‘a##_s_a’ symbols have size s and alignment a.
.bss 0x0000000000004081 0x0 /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crtn.o
*(COMMON)
*fill* 0x0000000000004081 0x7f
COMMON 0x0000000000004100 0x27e align.o
0x0000000000004100 a00_20_80
0x0000000000004180 a10_20_80
0x00000000000041a0 a24_20_20
0x0000000000004200 a30_20_80
0x0000000000004280 a20_20_80
0x0000000000004300 a40_20_80
0x0000000000004320 a22_11_10
0x0000000000004340 a05_20_20
0x0000000000004360 a32__e__8
0x0000000000004370 a07__8__8
0x0000000000004378 a15__4__4
0x000000000000437c a04__1__1
0x000000000000437d a34__1__1
0x0000000000004380 . = ALIGN ((. != 0x0)?0x8:0x1)
*fill* 0x000000000000437e 0x2
.lbss
There's a large fill of 0x7f to meet the alignment of the section, but
then a00_20_80 is only using 0x20 from 0x4100 so shouldn't there be a
fill of 0x60 to reach a10_20_80's 0x4180 address?
(BTW, that above list shows how large alignments aren't sorted. 0x20 is
amongst 0x80 and 0x10 comes before 0x20.)
Lastly, is it possible that the layout algorithm can know the value of
‘dot’ when it starts laying out the section? Then it could make use of
the 0x7f which is filled above because the alignment of the section need
not be the alignment of the largest thing in it.
I've typically seen large alignment on machines with lots of RAM so a
bit of wastage doesn't matter, though it affects cache-line density.
But now with microprocessors with a few KiB of SRAM offering DMA with
alignment requirements, I'm studying the map file to look for savings!
Comments and advice welcome. I'm not subscribed to the list or familiar
with BFD. Thanks.
--
Cheers, Ralph.
More information about the Binutils
mailing list