Bug 19541 - Format reader for ILF format (used by MSVC-generated import libraries) does not properly handle data imports
Summary: Format reader for ILF format (used by MSVC-generated import libraries) does n...
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2016-01-31 06:27 UTC by Nathaniel J. Smith
Modified: 2016-02-09 16:41 UTC (History)
1 user (show)

See Also:
Last reconfirmed:

First attempt at a patch (894 bytes, patch)
2016-01-31 06:33 UTC, Nathaniel J. Smith
Details | Diff
Tiny ILF file with DATA symbol for testing (57 bytes, application/x-object)
2016-01-31 20:05 UTC, Nathaniel J. Smith

Note You need to log in before you can comment on or make changes to this bug.
Description Nathaniel J. Smith 2016-01-31 06:27:53 UTC
On Windows, there are two formats for "import libraries" commonly in use: the format generated by dlltool and commonly used with mingw-w64 toolchains, which is an ar archive of COFF objects (conventionally named like "libpython27.a"), and the format generated and commonly used by MSVC toolchains, which is an ar archive of "ILF" objects (and conventionally named like "python27.lib").

Either way, the purpose of such files is to provide DLL imports. DLL imports come in two main flavors. There are "code" symbols, which refer to functions, and which require (a) a __imp___foo relocation symbol, and (b) _foo trampoline that just jumps to __imp___foo. And then there are "data" symbols, which refer to things like global variables exported from a DLL, and only require a __imp___foo symbol, with no trampoline symbol. In the ILF format these are represented identically, except that there is a flag value labeling each import symbol as either IMPORT_CODE or IMPORT_DATA.

Unfortunately, binutils's ILF-reading code currently only supports IMPORT_CODE symbols. Any symbols with the IMPORT_DATA flag set are completely skipped over, as if they didn't exist.

One place where this causes problems is when attempting to use binutils to link programs against the python C API, because python27.dll makes extensive use of DATA symbols. But really it would be nice to support these in general, because it should be simple to do and it would be nice if we didn't have to jump through hoops to use import libraries on Windows. In addition, mingw-w64 is looking to transition to a new 'genlib' tool for generating import libraries, and genlib also generates ILF-format libraries.

I've attached a patch that in theory ought to fix this. Basically, right now all the code for emitting import symbol information from ILF files is hidden behind switch { case IMPORT_CODE: ...}; my patch just takes the parts that are common to IMPORT_CODE and IMPORT_DATA and moves them out of the switch statement so that they are executed unconditionally. The function-specific trampoline generation code remains inside the switch statement, and if we see anything besides IMPORT_CODE or IMPORT_DATA then we abort(), so this should be safe.

The patch compiles, and using it to manually link a simple test program against python27.lib produces correct-looking output (i.e., a binary is successfully produced, and running objdump on it shows that it has a correct-looking import table that successfully references DATA symbols in python27.dll). Unfortunately though I don't have a great way to test this directly (no handy Windows system, and no convenient way to build a whole toolchain including my patched ld), and someone else reported that when they did all that then the compile seemed to work but the resulting binary segfaulted. (Possibly due to unrelated issues though, who knows.)

So, I'm posting this here in hopes that either someone who knows what they're doing will look at it and think "well, this is obviously correct", or else think "well, this is obviously broken, you need to do _____" :-)

Some discussion where we found this bug: https://github.com/mingwpy/mingwpy.github.io/issues/31#issuecomment-175334085
Comment 1 Nathaniel J. Smith 2016-01-31 06:33:28 UTC
Created attachment 8943 [details]
First attempt at a patch
Comment 2 Nathaniel J. Smith 2016-01-31 19:55:48 UTC
Update: on further investigation my helpful tester reports that their segfault problem was an unrelated configuration error that they made; once they sorted that out then all was well.

So, I'm now fairly confident that the attached patch does in fact solve the problem.
Comment 3 Nathaniel J. Smith 2016-01-31 20:05:48 UTC
Created attachment 8946 [details]
Tiny ILF file with DATA symbol for testing

Attaching an example ILF-format import member, extracted from a 32-bit python27.lib (as created by MSVC, and distributed in the python.org official python 2.7 windows downloads).

Without the patch, nm on this file shows no symbols; after the patch, it correctly shows a reference to __imp___Py_NoneStruct. Also, here's a simple test case:

---- nonestruct-test.c ----

__declspec(dllimport) struct PyObject_Struct _Py_NoneStruct;
struct PyObject_Struct *f()
    return &_Py_NoneStruct;

---- end nonestruct-test.c ----

Without the patch, linking this program to export-_Py_NoneStruct.o fails with:

  undefined reference to `_imp___Py_NoneStruct'

but with the patch, linking succeeds and correctly creates an import table referencing _Py_NoneStruct.
Comment 4 Nick Clifton 2016-02-05 11:01:56 UTC
Hi Nathaniel,

  Right - the patch is checked in.

  I have verified that the patched sources do indeed show the data symbols in the ILF object file you uploaded.  Unfortunately I have not been able to work out a way to turn this into a testcase in the binutils testsuite. So I will leave that for now.  In the meantime I am closing this PR, but if the problem resurfaces please do not hesitate to reopen it.

Comment 5 Nathaniel J. Smith 2016-02-05 18:26:41 UTC
Can the test suite include binary files? Two tests I can think of would be:

1) Include the ILF file; run objdump on it; confirm that the appropriate symbols appear.

2) Include the ILF file + a regular .o that references a data member in it; use ld to link them together and then use objdump to confirm that the resulting dll has a correct import table.
Comment 6 Nick Clifton 2016-02-09 16:41:57 UTC
Hi Nathaniel,

> Can the test suite include binary files?

Apparently there is a way, but I am not familiar with it. :-(