Gaps between BSS Variables; How are they Placed?

Thu Feb 4 14:38:36 GMT 2021

Hi,

Alan wrote:
> > When building an ELF executable using the BFD linker from a C file
> > through gcc(1) on amd64/x86_64, what area of code takes care of
> > ordering the common symbols in the .bss section?  I'd like to
> > understand its algorithm as I'm surprised by the results.
...
> ldlang.c:lang_common.

Thanks Alan, that's what I was after.

So now my understanding is lang_common() lays the symbols out in random
hash-walk order unless ld's --sort-common is used in which case the hash
is walked repeatedly, placing symbols from a set of alignments each time.

If descending is chosen then the hash is walked in five steps with the
alignment sets being:  >=2⁴  ==2³  ==2²  ==2¹  ==2⁰

    7094           for (power = 4; power > 0; power--)
    7095             bfd_link_hash_traverse (link_info.hash, lang_one_common, &power);
    7096
    7097           power = 0;
    7098           bfd_link_hash_traverse (link_info.hash, lang_one_common, &power);

This means the first >=2⁴ set aren't sorted but are still in hash-walk
order, and they're the big alignments where the fills will be
significant.  I'd say that's a bug compared to the documentation and can
waste a lot of space with big alignments, e.g. for DMA.  An extra
initial hash walk could find the largest alignment and work down from
there.

I don't see why the loop's condition isn't ‘power >= 0’ to remove the
need for the last couple of lines?  Perhaps just for symmetry with the
ascending version.

If ascending then the alignment sets are: ==2⁰  ==2¹  ==2²  ==2³  ==2⁴  >=2⁵
So again large alignments aren't sorted.

    7102           for (power = 0; power <= 4; power++)
    7103             bfd_link_hash_traverse (link_info.hash, lang_one_common, &power);
    7104
    7105           power = (unsigned int) -1;
    7106           bfd_link_hash_traverse (link_info.hash, lang_one_common, &power);

Separate from the ‘large alignments are unsorted’ issue above, the
method assumes something with an alignment requirement of 2ⁿ is at least
that large in size but that not need be the case with a DMA buffer on an
embedded system where SRAM is tight and it's still useful to DMA
20 bytes to a UART with 32-byte alignment.

Is that the cause for not seeing ‘*fill*’s in the map file?  These
‘a##_s_a’ symbols have size s and alignment a.

     .bss       0x0000000000004081    0x0 /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crtn.o
     *(COMMON)
     *fill*     0x0000000000004081   0x7f
     COMMON     0x0000000000004100  0x27e align.o
                0x0000000000004100            a00_20_80
                0x0000000000004180            a10_20_80
                0x00000000000041a0            a24_20_20
                0x0000000000004200            a30_20_80
                0x0000000000004280            a20_20_80
                0x0000000000004300            a40_20_80
                0x0000000000004320            a22_11_10
                0x0000000000004340            a05_20_20
                0x0000000000004360            a32__e__8
                0x0000000000004370            a07__8__8
                0x0000000000004378            a15__4__4
                0x000000000000437c            a04__1__1
                0x000000000000437d            a34__1__1
                0x0000000000004380            . = ALIGN ((. != 0x0)?0x8:0x1)
     *fill*     0x000000000000437e    0x2

    .lbss

There's a large fill of 0x7f to meet the alignment of the section, but
then a00_20_80 is only using 0x20 from 0x4100 so shouldn't there be a
fill of 0x60 to reach a10_20_80's 0x4180 address?

(BTW, that above list shows how large alignments aren't sorted.  0x20 is
amongst 0x80 and 0x10 comes before 0x20.)

Lastly, is it possible that the layout algorithm can know the value of
‘dot’ when it starts laying out the section?  Then it could make use of
the 0x7f which is filled above because the alignment of the section need
not be the alignment of the largest thing in it.

I've typically seen large alignment on machines with lots of RAM so a
bit of wastage doesn't matter, though it affects cache-line density.
But now with microprocessors with a few KiB of SRAM offering DMA with
alignment requirements, I'm studying the map file to look for savings!

Comments and advice welcome.  I'm not subscribed to the list or familiar
with BFD.  Thanks.

-- 
Cheers, Ralph.