Bug 21076 - (cygwin) Output DLL import lookup/address tables are incorrect
Summary: (cygwin) Output DLL import lookup/address tables are incorrect
Status: SUSPENDED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.27
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-23 19:01 UTC by Alex
Modified: 2017-02-16 10:44 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2017-02-16 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex 2017-01-23 19:01:36 UTC
I have a large chunk of code (some of which is large enough that the PE big-obj format must be used) that is being linked into a DLL by binutils-2.27 in a Cygwin 64-bit environment. I built the binutils myself, as the binutils currently distributed by Cygwin is too old and does not support the big-obj format.

The linking executes and completes okay, however the DLL fails to be loaded, with a missing dependency that cannot be resolved. Looking at the import tables using Dependency Walker (or just objdump -p), I see that a dependency that is marked as being in one DLL is "leaking" into being marked as needed in another. In other words, the import table says that a symbol "myfunc" is in BOTH "a.dll" and "b.dll", when it is supposed to only be in "b.dll". I used hexdump to view the raw linked DLL and, using the PE specification as a guide, found that the empty (NULL) entry that is supposed to signify the end of the import lookup/address table is being overwritten for "a.dll". So the dependencies for "b.dll" (in this example, a symbol called "myfunc") are overwritting the NULL terminating entry of the previous DLL ("a.dll"), so that from the parsing of the PE format "myfunc" appears to be in the lookup table and address table for "a.dll", and then also (and correctly) in that of "b.dll".

I cannot share the reproducing code, and this is the first time this has occurred. I have linked many other large code bases into DLLs and executables without this occurring, and other DLL references in the import lookup table of the DLL I'm linking do not exhibit this issue. This issue IS in both the import lookup table (.idata$4) and the import address table (.idata$5).

I have spent a good few hours already trying to find the source of the bug in the ld/bfd source code, but haven't found it yet (part of those hours was getting familiar with the program flow, though).

Any help would be greatly appreciated!
Comment 1 Nick Clifton 2017-01-24 10:19:13 UTC
Hi Alex,

  Without a testcase this is going to be a hard problem to investigate.

  I would guess that the problem is going to be related to the large size of the objects involved.  Possibly there is a 32-bit variable somewhere that ought to be 64-bits, or something like that.

  The place that I suggest you start looking is the ld/pe-dll.c source file.  Maybe tracing things back from the make_singleton_name_imp() function will help.

  Since this sounds like a memory corruption bug, you might also try using some of the memory leak detection tools to help you, eg valgrind, or mallopt.  Or compiling the linker with detection enabled, eg -fsanitize=address or -fsanitize=undefined.

Cheers
  Nick
Comment 2 Alex 2017-01-24 14:11:04 UTC
Nick,

Thanks for the hint. I realize the difficulty not having a test case represents. If I figure out exactly what is triggering it (but can't figure out how to fix it) I'll make a minimal reproduction.

I will look where you suggested. I also found that looking at the various _bfd_XXi_swap_* in bfd/peXXigen.c has been helpful. In particular, I see that in _bfd_XXi_swap_sym_in, empty idata$4 and idata$5 sections are created, and they are being created at the correct times after pe_ILF_build_a_bfd is called for each DLL's symbols. So at that point at least, I can verify that it's not as if the empty entries are simply being omitted.

Nothing obvious sticks out in _bfd_XXi_swam_sym_out either, which I thought could be a likely place where the bug was residing.
Comment 3 Alex 2017-01-24 14:57:30 UTC
Clarification: these DLL imports are being linked using the import libraries (.lib). So ld/pe-dll.c isn't involved.
Comment 4 Alex 2017-02-09 17:24:22 UTC
I have determined through further testing that the problem was occurring because some object files were created using "ld -r" to combine other object files together, and at one such point a DLL import library was being linked in that way. This was causing chaos later on in the linking process because the linker has no way (right now) of coalescing the various DLL imports into a single import table: the import library was basically being linked in twice, with different functions being used from it.

So this may not be a bug necessarily, just an issue arising from lacking functionality. But I hardly consider this a common, or even uncommon, use case, as this is not how DLL import libraries are intended to be linked whatsoever. It would be cool if it worked, but I imagine that will take a good amount of effort.

Technically, however, there probably should be some sort of error when this case occurs, rather than generating an invalid binary. But from the current state of the code, it looks like it would be difficult to detect it (since the code is just shuffling around various .idata symbols for the most part).

I'll leave it to you guys to determine whether this should be closed out.
Comment 5 Nick Clifton 2017-02-16 10:44:13 UTC
Hi Alex,

  I am going to put this bug in the suspended state for now.

  One fine day when I have lots of free time, I will investigate further.  Or
  maybe you or somebody else will find and fix the problem before then... :-)

Cheers
  Nick