Bug 24836

Summary: --as-needed can leave unused direct dependencies if combined with --gc-sections
Product: binutils Reporter: crusader.mike
Component: ldAssignee: Not yet assigned to anyone <unassigned>
Status: UNCONFIRMED ---    
Severity: normal CC: amodra, fweimer
Priority: P2    
Version: 2.30   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description crusader.mike 2019-07-22 16:55:17 UTC
I have a binary that is linked like this:

/opt/rh/devtoolset-8/root/usr/bin/g++
-O3 -DNDEBUG  -Wl,--gc-sections -s -Wl,--as-needed
<few .o files>
-o procmon.e
-Wl,-rpath,/usr/local/lib64
<bunch of my static libs .a>
/usr/local/lib64/libxalan-c.so
/usr/local/lib64/libxerces-c.so
<bunch of static libs(1) built with "vcpkg" intermixed with (2)>

(1) curl z aws-cpp-sdk-s3 aws-cpp-sdk-core ssl crypto aws-c-event-stream aws-c-common aws-checksums
    azurestorage uuid xml2 zma cpprest boost_log boost_log_setup boost_filesystem boost_thread boost_date_time
    boost_regex boost_chrono boost_atomic

(2) -lcrypt -lrt -lm -ldl -pthread


but (even though --as-needed is present) ldd still reports unused dependencies:

$ ldd -u -r procmon.e
Unused direct dependencies:
        /usr/local/lib64/libxalan-c.so.111
        /lib64/libcrypt.so.1
        /lib64/libm.so.6

Info:
CentOS 7
g++ (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3)
GNU ld version 2.30-47.el7
Comment 1 Alan Modra 2019-07-23 07:36:11 UTC
As a first step, please check your ld command line by passing -Wl,-v to g++.
Comment 2 crusader.mike 2019-07-23 17:56:36 UTC
Here is the output with "-Wl,-v":

collect2 version 8.2.1 20180905 (Red Hat 8.2.1-3)
/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/ld -plugin /opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/liblto_plugin.so -plugin-opt=/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper -plugin-opt=-fresolution=/tmp/cc4U4v6Q.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lpthread -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --build-id --no-add-needed --eh-frame-hdr --hash-style=gnu -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o procmon.e -s /lib/../lib64/crt1.o /lib/../lib64/crti.o /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/crtbegin.o -L/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8 -L/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/../../.. --as-needed --gc-sections -v CMakeFiles/procmon.e.dir/procmon.cpp.o CMakeFiles/procmon.e.dir/proc.cpp.o -rpath /usr/local/lib64 
<bunch of my static libs (.a)>
/usr/local/lib64/libxalan-c.so /usr/local/lib64/libxerces-c.so /home/user/vcpkg/installed/x64-linux/lib/libcurl.a /home/user/vcpkg/installed/x64-linux/lib/libz.a -lcrypt /home/user/vcpkg/installed/x64-linux/lib/libaws-cpp-sdk-s3.a /home/user/vcpkg/installed/x64-linux/lib/libaws-cpp-sdk-core.a /home/user/vcpkg/installed/x64-linux/lib/libcurl.a /home/user/vcpkg/installed/x64-linux/lib/libssl.a /home/user/vcpkg/installed/x64-linux/lib/libcrypto.a /home/user/vcpkg/installed/x64-linux/lib/libaws-c-event-stream.a /home/user/vcpkg/installed/x64-linux/lib/libaws-c-common.a -lrt /home/user/vcpkg/installed/x64-linux/lib/libaws-checksums.a -lpthread /home/user/vcpkg/installed/x64-linux/lib/libazurestorage.a /home/user/vcpkg/installed/x64-linux/lib/libuuid.a /home/user/vcpkg/installed/x64-linux/lib/libxml2.a /home/user/vcpkg/installed/x64-linux/lib/liblzma.a /home/user/vcpkg/installed/x64-linux/lib/libz.a /home/user/vcpkg/installed/x64-linux/lib/libcpprest.a -lpthread /home/user/vcpkg/installed/x64-linux/lib/libz.a /home/user/vcpkg/installed/x64-linux/lib/libssl.a /home/user/vcpkg/installed/x64-linux/lib/libcrypto.a -ldl /home/user/vcpkg/installed/x64-linux/lib/libboost_log.a /home/user/vcpkg/installed/x64-linux/lib/libboost_log_setup.a /home/user/vcpkg/installed/x64-linux/lib/libboost_filesystem.a /home/user/vcpkg/installed/x64-linux/lib/libboost_thread.a /home/user/vcpkg/installed/x64-linux/lib/libboost_date_time.a /home/user/vcpkg/installed/x64-linux/lib/libboost_regex.a /home/user/vcpkg/installed/x64-linux/lib/libboost_chrono.a /home/user/vcpkg/installed/x64-linux/lib/libboost_atomic.a -lstdc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8/crtend.o /lib/../lib64/crtn.o
GNU ld version 2.30-47.el7
Comment 3 crusader.mike 2019-07-23 17:58:22 UTC
Maybe garbage collection (-Wl,--gc-sections) happens after effect of "-Wl,--as-needed"?
Comment 4 crusader.mike 2019-07-23 18:04:57 UTC
Is this the reason for this behaviour?

$ readelf -s procmon.e | grep xalan
   237: 0000000000415bb0    33 FUNC    WEAK   DEFAULT   13 _ZN11xalanc_1_1111XalanVe
   289: 0000000000415bb0    33 FUNC    WEAK   DEFAULT   13 _ZN11xalanc_1_1111XalanVe

How can I drill deeper? (i.e. figure out what kind of symbols these are, why my executable references them, etc)
Comment 5 Florian Weimer 2019-07-23 19:33:01 UTC
(In reply to crusader.mike from comment #4)
> Is this the reason for this behaviour?
> 
> $ readelf -s procmon.e | grep xalan
>    237: 0000000000415bb0    33 FUNC    WEAK   DEFAULT   13
> _ZN11xalanc_1_1111XalanVe
>    289: 0000000000415bb0    33 FUNC    WEAK   DEFAULT   13
> _ZN11xalanc_1_1111XalanVe
> 
> How can I drill deeper? (i.e. figure out what kind of symbols these are, why
> my executable references them, etc)

Try readelf -sW.  The non-truncated symbol should be decodable by c++filt.
Comment 6 crusader.mike 2019-07-23 19:49:56 UTC
Symbol table '.dynsym' contains 361 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
...
   237: 0000000000415bb0    33 FUNC    WEAK   DEFAULT   13 xalanc_1_11::XalanVector<unsigned short, xalanc_1_11::MemoryManagedConstructionTraits<unsigned short> >::~XalanVector()
   289: 0000000000415bb0    33 FUNC    WEAK   DEFAULT   13 xalanc_1_11::XalanVector<unsigned short, xalanc_1_11::MemoryManagedConstructionTraits<unsigned short> >::~XalanVector()
...

Symbol table '.symtab' contains 1412 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
...
   859: 0000000000415bb0    33 FUNC    WEAK   DEFAULT   13 xalanc_1_11::XalanVector<unsigned short, xalanc_1_11::MemoryManagedConstructionTraits<unsigned short> >::~XalanVector()
  1374: 0000000000415bb0    33 FUNC    WEAK   DEFAULT   13 xalanc_1_11::XalanVector<unsigned short, xalanc_1_11::MemoryManagedConstructionTraits<unsigned short> >::~XalanVector()
...


Apologies, but my understanding is somewhat limited in this area (I am reading related docs right now, but it'll take time). Can someone explain what these entries mean and how to find why my binary ended up having them?

Another question would be why there are two symbols (_ZN11xalanc_1_1111XalanVectorItNS_31MemoryManagedConstructionTraitsItEEED1Ev and _ZN11xalanc_1_1111XalanVectorItNS_31MemoryManagedConstructionTraitsItEEED2Ev) that unmangle to the same C++ symbol?
Comment 7 Alan Modra 2019-07-29 06:07:03 UTC
> Maybe garbage collection (-Wl,--gc-sections) happens after
> effect of "-Wl,--as-needed"?

It does, and that might be why you have shared libraries seen as needed before garbage collection runs.  If you link with -Wl,-Map,mapfilename then inspecting mapfilename will show you which symbol caused each shared library to be needed.
Comment 8 crusader.mike 2019-07-29 20:29:34 UTC
Alan, you are correct -- looks like garbage collection can remove symbol references to the point that final binary no longer needs given DT_NEEDED shared lib anymore. That is precisely what happens in my case. And if you carefully read --as-needed documentation -- it works precisely as declared (not as expected :)).

Now question is:
1. Is there any way to discard DT_NEEDED entries that are no longer needed? (apparently determining this isn't trivial according to my admittedly basic understanding of dynamic linker's behavior)
2. Should --as-needed behavior be modified to address this? Or is it better to make --gc-section sensitive to --as-needed presence (and perform additional cleanup)?



Additionally, I've read a lot about [dynamic] linker behavior (big thanks to gold's author for blog posts/etc) and can answer some of my own questions:

3. That weird symbol in my final binary is an inline C++ function (XalanVector<...> destructor) that wasn't inlined.

4. ~XalanVector() wasn't garbage collected very likely because there is a global variable that uses it. Is there any way to track down that variable?

5. There is no corresponding constructor because it was inlined.

6. I can't explain why that destructor produced two entries in .dynsym table (which end with EEED1Ev and EEED2Ev respectively). Interestingly they both have same address/type/etc. My mapfile mentions only one of them:

    .text           0x00000000004074e0    0x2cd72
    ...
     .text._ZN11xalanc_1_1111XalanVectorItNS_31MemoryManagedConstructionTraitsItEEED2Ev
                    0x0000000000415bb0       0x21 ../../CommonLib/libCommon.a(NXMLNodeUnix.cpp.o)
                    0x0000000000415bb0                xalanc_1_11::XalanVector<unsigned short, xalanc_1_11::MemoryManagedConstructionTraits<unsigned short> >::~XalanVector()
                    0x0000000000415bb0                xalanc_1_11::XalanVector<unsigned short, xalanc_1_11::MemoryManagedConstructionTraits<unsigned short> >::~XalanVector()

plus an entry in .gcc_except_table (I assume this is used during stack unwinding):

    .gcc_except_table
    ...
     .gcc_except_table._ZN11xalanc_1_1111XalanVectorItNS_31MemoryManagedConstructionTraitsItEEED2Ev
                    0x000000000044137e        0x4 ../../CommonLib/libCommon.a(NXMLNodeUnix.cpp.o)
    

I would appreciate any help explaining origin and purpose of EEED1Ev symbol.

7. I find it curious that my final binary contains huge .dynsym table. Even if few symbols are actually used by shared libs -- should be rest of those entries removed to save space? Is there a way to find which ones of these symbols are used (and by which shared lib)?
Comment 9 crusader.mike 2019-07-29 21:31:20 UTC
... about #6, running my binary with LD_DEBUG:

LD_DEBUG=bindings LD_BIND_NOW=1 ./procmon.e

produces curious output:

...
     27438:     binding file /usr/local/lib64/libxalan-c.so.111 [0] to /<blah>/procmon.e [0]: normal symbol `_ZN11xalanc_1_1111XalanVectorItNS_31MemoryManagedConstructionTraitsItEEED1Ev'
...

i.e. libxalan-c ends up using EEED1Ev symbol from my executable! EEED2Ev isn't mentioned in this output at all.

Another thing -- libxalan-c.so has both of these symbols in .dynsym and .symtab tables, both weak, both have same address. With only one difference: only EEED1Ev is mentioned in .rela.dyn (table of relocations?)

I am still not sure what is going on here, though...
Comment 10 Alan Modra 2019-07-30 02:10:49 UTC
Regarding the interaction between --gc-sections and --as-needed, yes it would be possible to run a pass over as-needed dynamic objects after garbage collection to check whether their symbols are still needed.  This might be a lot of work for little gain, and to do better than just removing DT_NEEDED entries would basically require iterating the link.  (A dynamic object reference to symbols in the executable or shared library being linked marks the sections of those symbols against garbage collection.) 

Here's a comment from gcc/cp/mangle.c that should help explain the various destructor symbol variations.

/* Handle destructor productions of non-terminal <special-name>.
   DTOR is a destructor FUNCTION_DECL.

     <special-name> ::= D0 # deleting (in-charge) destructor
		    ::= D1 # complete object (in-charge) destructor
		    ::= D2 # base object (not-in-charge) destructor  */
Comment 11 crusader.mike 2019-07-30 17:02:38 UTC
> ... it would be possible ... a lot of work for little gain
Well, in my case (which I believe isn't very rare) majority of functionality is locked in a few large static libs (of dubious quality) and all minor tools link them and (due to various effects) end up having dummy dependencies, forcing me to package unnecessary libs in deliverables (plus some waste at runtime). Fixing this is not huge, but nice to have.

In terms of work -- what about associating a counter with every lib, which goes up for every resolved symbol and goes down for every symbol marked for garbage collection? At the end -- remove all DT_NEEDED entries with counter at 0. Well, (since I am ignorant wrt ld implementation) it is probably a dumb idea, so I 'll leave this problem with those who know what they are doing.


> ... to do better than just removing DT_NEEDED entries would basically require iterating the link
What do you mean by "to do better"?


> ... that should help explain the various destructor ...
Thank you, Alan. Now it makes sense.


Can you comment on #7? I.e. why elf executable ends up having large .dynsym table? is there a way to to trim it down only to stuff used by it's shared libs? Thank you.