Unlink COFF linker, ELF link loads member of archive when there is a definition for a common symbol.
Created attachment 744 [details] A testcase I got [hjl@gnu-d archive]$ make gcc -c -o x.o x.c gcc -c -o foo.o foo.c ar rv libfoo.a foo.o ar: creating libfoo.a a - foo.o ld -r -o bar.o x.o libfoo.a libfoo.a(foo.o):(.bss+0x4): multiple definition of `bar' x.o:(.bss+0x0): first defined here make: *** [bar.o] Error 1 rm foo.o
The change was introduced by http://sourceware.org/ml/binutils/1999-12/msg00057.html
According to the comment, the change was made to follow Solaris and HPUX.
For the record, I will repost a private mail I had with H.J about this. ========= Ok I've spent a while staring at that code and thinking about this, and even though you say it is there on purpose, I believe it is wrong. Here's why. It is expected behaviour to be able to over-ride variables and objects that appear in libraries, whether shared or archive makes no difference. For example, libraries like dmalloc depend on that behaviour. In this case, so do a lot of GNU programs. Even though libc may provide getopt, the program wants to provide its own getopt, especially for long option handling. An explicitly named object should always take precedence over an object in a library. The only reason that is *not* happening here is becuase GNU getopt declares optarg as: char *optarg; thus making it a common symbol. If I was to change that to char *optarg = 0; then it becomes a normal data symbol and it will be used in preference to the one in the library, which is exactly the intended behaviour. The only reason the link editor has to pull in teh object from the archive is if it provides some *other* symbol that the program needs, and in that case you would legitimately get a warning that the same symbol is defined in two places. However, simply rejecting the explicitly named object in favour of the object in the archive just becuase the explicit object didn't initialize the variable breaks a very fundamental UNIX paradigm. ======== I read the mail thread pointed to in #2, and Ian asked what SVR4/UnixWare do. UnixWare treats it as I describe above. In fact the current GNU ld is broken on that platform because of this. I spoke to the author of the gABI and he maintains the Solaris linker is broken, and the UnixWare one is correct. With no prompting he cited almost the exact same reasons I outlined above. The problem is the gABI doesnt specify semantic interprtation of COMMON symbols. In the gABI authors words, that was because the behaviour was "older than ELF itself" and simply the way archives were meant to be handled.
Unfortunately it's too simple to allude to the historical handling of common symbols. In a.out linkers when a common symbol appears in an object, and the symbol is defined in an object in an archive, then the object in the archive is pulled into the link (actually this is somewhat target dependent--the SunOS linker would pull in definitions which were in the .data section but not ones which were in the .text sectin, assuming that a function could never merge with a common symbol). Moreover, if a common symbol appears in an object, and the symbol is a common symbol in an object in an archive, then in an a.out linker the size of the common symbol is changed, but the object is *not* pulled into the link. This last behaviour is of course pretty crazy. But in general it isn't reasonable for the ELF ABI to claim that they just rely on historical behaviour for the definition of common symbols, because in fact ELF common symbols do not act like historical ones do. That said, I was never all that happy with this change, and I think the behaviour before the change was more coherent. But, unfortunately, given the way that system files and libraries are written, it is important that we be compatible with system linkers. You say the UnixWare linker acts differently. That suggests that we need to make this target dependent. This is precedent for this in the a.out linker, and the use of the common_skip_ar_aymbols field in struct bfd_link_info.
Created attachment 746 [details] A testcase I think Solaris linker hehavior makes some senses. Kean, can you try this testcase with your linker? I got bash-3.00$ make gcc -c -o main.o main.c gcc -c -o define.o define.c ar rv libtest.a define.o ar: creating libtest.a a - define.o gcc -o main1 main.o libtest.a gcc -o main2 main.o define.o gcc -shared -o libtest.so define.o gcc -o main3 main.o libtest.so -Wl,-rpath,. ./main1 3 ./main2 3 ./main3 3 It is very consistent.
(In reply to comment #5) > Unfortunately it's too simple to allude to the historical handling of common > symbols. In a.out linkers when a common symbol appears in an object, and the > symbol is defined in an object in an archive, then the object in the archive is > pulled into the link (actually this is somewhat target dependent--the SunOS You sure about that? I tries on OpenServer and UnixWare. On OSR5, I tried in both COFF and ELF modes. In all three cases, the symbol was pulled from the object and NOT the archive. The SunOS behaviour you described is a bit funky :) > This last behaviour is of course pretty crazy. But in general it isn't > reasonable for the ELF ABI to claim that they just rely on historical behaviour > for the definition of common symbols, because in fact ELF common symbols do not > act like historical ones do. Thats a fair comment. I guess it depends on whose view of "historical" behaviour you take. The author of the gABI is of course a UnixWare-head, so his notion of "historical" may be a wee bit biased. But he has been with AT&T/USL/Novell/SCO/Caldera/SCO-again for an aweful long time, and is a mine of historical info. > That said, I was never all that happy with this change, and I think the > behaviour before the change was more coherent. But, unfortunately, given the > way that system files and libraries are written, it is important that we be > compatible with system linkers. You say the UnixWare linker acts differently. And OpenServer, for what thats worth (actually from a historical perspective, its worth a bit becuase its a dual-ABI system supporting both SVR3.2 COFF and SVR4 ELF). The problem I have with teh current implementation is this. Despite what looks like rational behaviour with H.J's test cases (I'll respond to his comment next), I dont think the test case proves anything except that the bahviour *looks* rational. But in terms of every day developer activities, its not. *Especially* in the case where the symbol in teh object is the same as the symbol in a system library. The particular case that casued me to discover this bug was trying to compile jwhois with a version of gcc that was newly modified to used the GNU ld (historically, on OSR5 and UnixWare, the native ld was used which was why I never saw this problem before). jwhois legitimately wants to use its own getopt() library, to support the GNU style long options. I now get a link failure becuase optopt is defined in both jwhois and libc.so. It is worth noting that libc.so is in fact a normal ar archive that has some number of objects in it that are mean to be linked directly into the a.out, as well as a copy of libc.so.1, which is what gets you the dynamic portion - a common trick). The libc.so has a member opt_Data.a, which defined optopt, optind, optarg etc. optopt is initialized to 0. In jwhois (and indeed anything that uses the GNU getopt), optopt isn't initialized, its just declared as 'char *optopt;'. By forcing the symbol to come from libc.so simply becuase the one in there is a normal data symbol and the the one in getopt.o is a common is wrong. The linker needs no other symbols from opt_data.o, and is pulling it in only because of the common/global thing. Extend that to more common cases where I want to, for example, override malloc for a debugging malloc library. If any portion of malloc had a data symbol (like a mallopt structure or some such), I would be unable to override malloc() with my spiffy new malloc-debugging library becuase GNU ld would be pulling in the object from the library. The above situation is made even worse when you are using libc.a instead of libc.so, for static links. > That suggests that we need to make this target dependent. This is precedent for > this in the a.out linker, and the use of the common_skip_ar_aymbols field in > struct bfd_link_info. Of course I would be happy with making this behaviour optional, becuase that would get around my immediate problem and I can go about using GNU ld to my heart's content. But I think that people who think they need the current behaviour are in for some nasty surprises, as described above. I tested this on Solaris 10, and the native link editor does in fact behave the same way the GNU one does, but that doesn't necessarily make either one correct. Sorry for the rambling reply :)
> I think Solaris linker hehavior makes some senses. Kean, can you try > this testcase with your linker? I agree on teh surface it makes sense, but it also has very specific broken behaviour. See previous comment. > ./main1 > 3 > > ./main2 > 3 > > ./main3 > 3 > I get: ./main1 0 ./main2 3 ./main3 3 On OSR5 in COFF mode (no main3 becasue no shared libraries): ./main1 0 ./main2 3 The above using the native tools of course. Using gcc, I get the same results you do becuase its using the same ld you are.
Am I sure about the a.out behaviour? Yes, I am. When I refer to SunOS I do mean SunOS 4, pre Solaris, which used the a.out object file format. The strange behaviour of common symbols increasing size even without linking in the object file was used to make stdin/stdout/stderr work in the traditional a.out libc. A linker which failed to implement it correctly could not link a "hello, world" program. AT&T went to COFF in SVR3, and they changed the behaviour of common symbols at that time. I've used SVR2, but I don't have a clear recollection of how the linker worked. I think that on Solaris we have to do what the native linker does. Likewise on UnixWare. So if they have different behaviour, we have to have different defaults. It would of course be reasonable to provide a command line option to control this.