Bug 13557 - Undef. ref. err. when linking with slim LTO obj. in static lib. (mingw32 target)
Summary: Undef. ref. err. when linking with slim LTO obj. in static lib. (mingw32 target)
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.24
: P1 normal
Target Milestone: 2.25
Assignee: Alan Modra
URL:
Keywords:
: 15143 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-01-03 22:38 UTC by Dmitry Gorbachev
Modified: 2014-08-05 01:35 UTC (History)
3 users (show)

See Also:
Host:
Target: mingw32
Build:
Last reconfirmed:


Attachments
Testcase (314 bytes, text/plain)
2012-01-03 22:38 UTC, Dmitry Gorbachev
Details
Another testcase (284 bytes, text/plain)
2012-11-13 18:42 UTC, Dmitry Gorbachev
Details
Verbose gcc and ld output (6.82 KB, text/plain)
2014-04-28 00:13 UTC, LRN
Details
A hack to fix lto linking (950 bytes, patch)
2014-04-29 12:08 UTC, LRN
Details | Diff
A hack to fix lto linking (977 bytes, patch)
2014-04-29 22:29 UTC, LRN
Details | Diff
rewrite coff archive handling (9.24 KB, patch)
2014-08-01 16:50 UTC, Alan Modra
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Gorbachev 2012-01-03 22:38:33 UTC
Created attachment 6143 [details]
Testcase
Comment 1 Dmitry Gorbachev 2012-01-12 10:35:25 UTC
The testcase produces this error:

libfoobar.a(foo.o):foo.c:(.text+0x7): undefined reference to `_bar'
Comment 2 Dmitry Gorbachev 2012-11-13 18:42:31 UTC
Created attachment 6731 [details]
Another testcase

/tmp/cc8ikw6t.ltrans0.ltrans.o:cc8ikw6t.ltrans0.o:(.text+0xc): undefined reference to `foo'
Comment 3 LRN 2014-04-28 00:13:04 UTC
Created attachment 7564 [details]
Verbose gcc and ld output

This is still unfixed as of gcc-4.9.0 and binutils-2.24.51.20140427.

I do this (the source files are the same, from attachment 6731 [details]): 

> gcc --verbose -c -flto -g -O2 foo.c -o foo.o
> gcc-ar -rcsv libfoo.a foo.o
> gcc --verbose -c -flto -g -O2 main.c -o main.o
> gcc --verbose -Wl,--verbose=2,-Map,_map -flto -g -O2 main.o -o main ./libfoo.a

And it can't resolve foo.

I did check, gcc-ar runs ar with the right -plugin=... argument. Substituting "gcc-ar" for "ar" (or for "ar -plugin=...") does not help.

nm ./libfoo.a correctly identifies its contents as:
> foo.o:
> 00000000 T _foo
This is if i have lto_plugin in /mingw/lib/bfd-plugins; if i don't, nm says this:
> foo.o:
> 00000000 b .bss
> 00000000 d .data
> 00000000 r .gnu.lto_.decls.a5a741ee
> 00000000 r .gnu.lto_.inline.a5a741ee
> 00000000 r .gnu.lto_.jmpfuncs.a5a741ee
> 00000000 r .gnu.lto_.opts
> 00000000 r .gnu.lto_.profile.a5a741ee
> 00000000 r .gnu.lto_.pureconst.a5a741ee
> 00000000 r .gnu.lto_.refs.a5a741ee
> 00000000 r .gnu.lto_.symbol_nodes.a5a741ee
> 00000000 r .gnu.lto_.symtab.a5a741ee
> 00000000 r .gnu.lto_foo.a5a741ee
> 00000000 r .rdata$zzz
> 00000000 t .text
> 00000001 C ___gnu_lto_slim
> 00000001 C ___gnu_lto_v1
(gcc-nm, obviously, works regardless of whether a copy of lto plugin is in /mingw/lib/bfd-plugins, that's the point of having a wrapper).

The output of the commands listed above is attached.

Note where it says:
> attempt to open ./libfoo.a succeeded
it doesn't say:
> (./libfoo.a)foo.o
after that. My guess is that ld just does not understand the object file format somehow. When i run exactly the same testcase on my Debian machine (with host gcc), i do get this:
> attempt to open ./libfoo.a succeeded
> (./libfoo.a)foo.o
and everything works.

Same with the map file (not attached, i'll provide it if requrested): i686-mingw-w64-ld mentions libfoo once:
> LOAD ./libfoo.a
Whereas on Debian libfoo is mentioned at the beginning, where archive members included to satisfy reference by file are listed:
> ./libfoo.a(foo.o)             main.o (symbol from plugin) (foo)
(and, of course, there's a "LOAD ./libfoo.a" later on as well).

I have debug symbols for binutils and gcc. If you tell me where to look, i may be able to trace how ld opens archives and reads object files from them.
Comment 4 LRN 2014-04-28 00:16:45 UTC
This bug was independently rediscovered a couple of days ago by two other people - http://comments.gmane.org/gmane.comp.gnu.mingw.w64.general/9699 , and then by me as well.
Comment 5 LRN 2014-04-28 21:33:29 UTC
So far my guess is that the difference is in coff_link_check_ar_symbols():

When a "normal" static library (i've made a version of libfoo.a without -flto) is being loaded, coff_link_check_ar_symbols() lists all its symbols, finds all global or common symbols, gets their names, and resolves them, if undefined.
"Normal" libfoo has a global symbol esym == "_foo" (with name "_foo"), which passes all checks, and eventually is fed to add_archive_element().

LTO libfoo.a does not have that. It has a number of local 'section' symbols (".text", ".bss" and such), a number of local symbols with esym[0] == 0 and two common symbols with esym[0] == 0 and names "___gnu_lto_v1" and "___gnu_lto_slim".
None of them is the right thing, as far as coff_link_check_ar_symbols() is concerned.

My guess is that a plugin should hook up at some point (i have not been able the identify the place where this "some point" is in the code) and handle the symbols, but it doesn't, for some reason.
Comment 6 LRN 2014-04-29 12:08:25 UTC
Created attachment 7566 [details]
A hack to fix lto linking

Built binutils with debuginfo on Debian, compared with what runs on Windows.

elf_link_add_archive_symbols() just loops through all archive symbols and calls
>	  if (!(*info->callbacks
>		->add_archive_element) (info, element, symdef->name, &element))
for each one of them. It only needs element to be non-NULL, and to have the bfd_object format.

_bfd_generic_link_add_archive_symbols(), on the other hand, does the checks detailed above AND calls
>	  if (! (*checkfn) (element, info, &needed))
>	    goto error_return;
*checkfn is coff_link_check_archive_element(), it calls coff_link_check_ar_symbols(), which i've examined earlier.

Here's a hack that forces coff_link_check_ar_symbols() to recognize gnu lto symbol names. My guess is that a non-hacky version should consult the plugin (maybe call claim_file() over element or something?).
Comment 7 LRN 2014-04-29 14:26:08 UTC
No, this does not work. For real-life cases (libcairoboilerplate) ar symbol table consists of multiple copies of ___gnu_lto_v1 and ___gnu_lto_slim or somesuch.
The
>      arh = archive_hash_lookup (&arsym_hash, h->root.string, FALSE, FALSE);
lookup never considers them to be candidates for resolving undefined references, thus 
>	  if (! (*checkfn) (element, info, &needed))
is never called.

Need to look further.
Comment 8 LRN 2014-04-29 22:29:55 UTC
Created attachment 7567 [details]
A hack to fix lto linking

v2:
* fixed a stupid buffer size bug (should have read the code before re-using macros...)
* fixed an error where symbol might get mischopped if it had too many '.' in it (now uses strrchr to find the last dot).
Comment 9 LRN 2014-05-05 15:30:08 UTC
Also, to build a working static library one must use the -s option to `ar', or (preferably) run `ranlib' on the static library (ranlib is preferable, as `ar' from 2.24 may not do the right thing when called with -s, even if you call it via `gcc-ar'; git versions of binutils don't have this problem).
Not doing so results in a static archive that, when slurped, looks like it only has multiple functions, all named either __gnu_lto_slim or __gnu_lto_v1.
Comment 10 Alan Modra 2014-07-28 11:09:43 UTC
I think coff_link_check_archive_element is just plain broken.  It really has no business reading symbols from the archive element to check against undefined symbols in the linker hash table.  That already has been done by _bfd_generic_link_add_archive_symbols when the armap symbols are checked against undefined symbols..  For lto objects, checking the symbols again of course checks the wrong symbols, because the lto symbols are not available until after the plugin has claimed the archive element!
Comment 11 Alan Modra 2014-08-01 16:50:13 UTC
Created attachment 7736 [details]
rewrite coff archive handling

Please test out this patch.  I don't have a native mingw system to properly test it.  (I know I could set one up, but that takes time and disk space.)
Comment 12 LRN 2014-08-02 05:42:59 UTC
The patch applies, binutils builds, the small testcase (attachment 6731 [details]) links successfully. Whether it works as well for real-world cases remains to be seen (i may be able to test it in a few days).
Comment 13 LRN 2014-08-02 05:51:27 UTC
Probably unrelated: initially lto-wrapper (ran by gcc when linking main.exe) bailed with error "lto-wrapper: CreateProcess: No such file or directory".
This is because lto-wrapper tries to spawn a "gcc" process (likely taken from $COLLECT_GCC), and it only searches in PATH (and maybe the directory where lto-wrapper is located, but gcc.exe isn't there).

I fixed this by adding the bin directory (where gcc.exe is) to the PATH. This directory is *usually* in PATH (for obvious reasons) but i don't remember this being a strict requirement, and everything works just fine unless LTO is involved.

This hasn't been mentioned so far, so here it is, for completeness' sake.
Comment 14 LRN 2014-08-04 19:55:40 UTC
Built around 150 packages with this patch, also built GTK with -flto. Everything builds and works, convenience libraries are confirmed to be slim-lto-only, there's a difference in size of the binaries. I'd say everything is in working order. If there are corner cases left, i'm not hitting them.
Comment 15 Dmitry Gorbachev 2014-08-04 21:17:47 UTC
*** Bug 15143 has been marked as a duplicate of this bug. ***
Comment 16 Dmitry Gorbachev 2014-08-04 21:18:32 UTC
Unfortunately, I can't test the patch on a native mingw system. I tried to cross-build (i686-pc-linux-gnu/i686-w64-mingw32) a large project and everything worked well. Thanks!
Comment 17 Sourceware Commits 2014-08-05 01:23:22 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  02eb0a49bceb35e4b0503e6ffc11e85151dbc571 (commit)
       via  13e570f80cbfb299a8858ce6830e91a6cb40ab7b (commit)
      from  241fd515ad94fa11d4608d4fe8108c382792d3be (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=02eb0a49bceb35e4b0503e6ffc11e85151dbc571

commit 02eb0a49bceb35e4b0503e6ffc11e85151dbc571
Author: Alan Modra <amodra@gmail.com>
Date:   Tue Aug 5 10:48:47 2014 +0930

    Fix load of archive element with common def for -u sym
    
    	* linker.c (generic_link_check_archive_element): Move handling
    	of command link -u symbols with a common symbol def to the
    	code handling non-common symbols so that archive element symbols
    	are loaded.  Use generic_link_add_object_symbols.

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=13e570f80cbfb299a8858ce6830e91a6cb40ab7b

commit 13e570f80cbfb299a8858ce6830e91a6cb40ab7b
Author: Alan Modra <amodra@gmail.com>
Date:   Tue Aug 5 10:46:57 2014 +0930

    Fix LTO vs. COFF archives
    
    Avoid scan of symbols on objects in coff archives since we don't need
    to do anything special with common symbols.  The scan is quite useless,
    and breaks LTO due to slim LTO objects not having symbols available
    until after the plugin has claimed them.  Instead we can add objects
    based on their archive symbol map.
    
    Also, rip out the archive symbol hash table used by the generic
    linker.  Using a hash breaks one feature of unix archive linking;
    The first object file in an archive defining any given symbol should
    be the object extracted to satisfy that symbol.  What's more a hash
    isn't much faster except in pathological cases where object file
    ordering causes many scans of the archive.  See the comment which I'm
    removing from elf_link_add_archive_symbols.
    
    Finally, tidy elflink.c archive handling a little.
    
    	PR 13557
    	* linker.c (struct archive_list, struct archive_hash_entry,
    	struct archive_hash_table, archive_hash_newfunc,
    	archive_hash_table_init, archive_hash_lookup, archive_hash_allocate,
    	archive_hash_table_free): Delete.
    	(_bfd_generic_link_add_archive_symbols): Add h and name params to
    	checkfn.  Rewrite using a straight-forward scan over archive map.
    	(generic_link_check_archive_element_no_collect,
    	generic_link_check_archive_element_collect,
    	generic_link_check_archive_element): Add h and name params.
    	* aoutx.h (aout_link_check_archive_element): Likewise.
    	* pdp11.c (aout_link_check_archive_element): Likewise.
    	* xcofflink.c (xcoff_link_check_archive_element): Likewise.
    	* cofflink.c (coff_link_check_archive_element): Likewise.  Don't
    	scan symbols, simply add archive element whenever h is undefined.
    	(coff_link_check_ar_symbols): Delete.
    	* ecoff.c (read_ext_syms_and_strs): Delete.
    	(reread_ext_syms_and_strs): Delete.
    	(ecoff_link_check_archive_element): Add h and name param.  Don't
    	scan symbols, simply add based on h.  Use ecoff_link_add_object_symbols.
    	* elflink.c (elf_link_is_defined_archive_symbol): Don't test
    	archive_pass.
    	(elf_link_add_archive_symbols): Delete "defined" array, merge
    	functionality into "included".  Make "included" a char array.  Don't
    	set or test archive_pass.
    	* libbfd-in.h (_bfd_generic_link_add_archive_symbols): Update.
    	* libbfd.h: Regenerate.

-----------------------------------------------------------------------

Summary of changes:
 bfd/ChangeLog   |   37 +++++
 bfd/aoutx.h     |    2 +
 bfd/cofflink.c  |  116 ++--------------
 bfd/ecoff.c     |  164 ++---------------------
 bfd/elflink.c   |   53 ++------
 bfd/libbfd-in.h |    4 +-
 bfd/libbfd.h    |    4 +-
 bfd/linker.c    |  410 ++++++++++++++++---------------------------------------
 bfd/pdp11.c     |    4 +-
 bfd/xcofflink.c |    4 +-
 10 files changed, 204 insertions(+), 594 deletions(-)
Comment 18 Alan Modra 2014-08-05 01:35:55 UTC
patch applied mainline