[Patch] skipping import libraries for performance reasons - direct auto-import of dll's

Charles Wilson cwilson@ece.gatech.edu
Thu Nov 28 13:24:00 GMT 2002


Okay, I've built and tested with this patch, and looked at the code a 
little more closely than previously.  I've attached a modified version 
of Ralf's patch [but don't commit my version; wait for Ralf to regen].

The modified patch fixes a number of compiler warnings (use "%lx" not 
"%x" for long int variables; avoid use of uninitialized variables, etc). 
   I've also fixed up the formatting, and added a few changes to correct 
a misfeature that I discovered.

---------------------------------------------------------------

I've initialized the [data|bss]_[start|end] variables as:

/* Initialization with start > end guarantees that is_data will not be
    set by mistake, and avoids compiler warning */
   unsigned long data_start = 1;
   unsigned long data_end   = 0;
   unsigned long bss_start  = 1;
   unsigned long bss_end    = 0;

so that the statement below doesn't cause a compiler warning...or a run 
time error.  It is possible (I think) for a DLL to not have a .bss 
section at all, in which case bss_[start|end] never get initialized 
without the change above.

         is_data = (func_rva >= data_start && func_rva < data_end )
                   || (func_rva >= bss_start && func_rva < bss_end);

---------------------------------------------------------------

At one point, Ralf uses the following

/* skip unwanted symbols, which are exported in buggy auto-import
    releases */
if (strstr(erva + name_rva,"_nm_") == 0)

What's the real purpose of this?  It disallows my_symbol_nm_foo, as well
as _nm_foo or _imp_nm_foo or whatever it is that you're trying to screen 
out.  Would it be better to use something like this, instead:

if (strncmp(erva+name_rva,"_nm_",4) != 0)

which would screen out only those symbols that *begin* with _nm_?  My 
modified patch does NOT make this change, but I wonder if it should.

---------------------------------------------------------------

For the most part, it works as advertised.  I did run in to one problem 
though.  If I create a file structure like this:

/usr/local/bin/cygfoo.dll
/usr/local/lib/libfoo.dll.a -> /usr/local/bin/cygfoo.dll

Which seems like a logical thing to do, given that we're using the DLL 
to "substitute" for a true import lib.  This way, you can do

gcc -o bar.exe bar.o -L/usr/local/lib -lfoo

and ld will use the symlink libfoo.dll.a to satisfy the dependency. 
Unfortunately, this doesn't work, because ld doesn't realize that 
"libfoo.dll.a" is actually a (symlink to) a DLL, and the 
pe_implied_import_dll routine is never called.

I know there are OTHER ways to set up the filesystem so that the gcc 
command above will work, such as:

/usr/local/bin/cygfoo.dll
/usr/local/lib/libfoo.dll -> /usr/local/bin/cygfoo.dll

or even

/usr/local/bin/cygfoo.dl
/usr/local/lib/cygfoo.dll -> /usr/local/bin/cygfoo.dll

But my point is that the original filesystem setup *should* work but 
does not.  The problem is in emultempl/pe.em (line 1395):

   if (bfd_get_format (entry->the_bfd) == bfd_object)
     {
       const char *ext = entry->filename + strlen (entry->filename) - 4;
       if (strcmp (ext, ".dll") == 0 || strcmp (ext, ".DLL") == 0)
         return pe_implied_import_dll (entry->filename);
     }
#endif
   return false;
}

As you can see, pe_implied_import_dll is only called if the filename 
ends in .dll or .DLL.  We know that the DLL itself must have a name that 
ends in .dll(.DLL), but the linker ought to be able to recognize a 
symlink-to-a-dll as well(*).  The stuff above should be replaced by 
something like the following:

   if (bfd_get_format (entry->the_bfd) == bfd_object)
     {
       char fbuf[PATH_MAX];
       const char *ext;
       if (realpath(entry->filename,fbuf) == NULL)
         strncpy(fbuf,entry->filename,PATH_MAX);
       ext = fbuf + strlen (fbuf) - 4;
       if (strcmp (ext, ".dll") == 0 || strcmp (ext, ".DLL") == 0)
         return pe_implied_import_dll (entry->filename);
     }
#endif
   return false;
}

Only problem: there's no guarantee that realpath or PATH_MAX is 
available, so we need to jump thru some hoops to define LD_PATHMAX to 
PATH_MAX or MAXPATHLEN or whatever, depending on what headers are 
available...

So, we have to play games in ld/sysdep.h, and modify configure.in (and 
run autoconf and autoheader) ...but once that's done, the 
/usr/local/lib/libfoo.dll.a -> /usr/local/bin/cygfoo.dll scenario works.

- - - - - - - - - - - - - - - - - - - -
(*) symlink-to-a-dll would be INVALID without this change (already in 
Ralf's patch):

+  /* use internal dll name instead of filename
+     to enable symbolic dll linking */
+  dll_name = pe_as32 (expdata + 12) + erva ;

Without it, the symlink's name would get embedded into the target as a 
dependency -- and the Windows Runtime Loader would get really confused 
since it doesn't understant symlinks, and only loads files that DO end 
in .dll.  So that's why this "problem" never came up before; it's only 
worth consideration given Ralf's change...but Ralf's change should be 
accompanied by the configure.in/config.in changes.
- - - - - - - - - - - - - - - - - - - -

---------------------------------------------------------------

I've split the patch into two pieces:
   ld-auto-import-dll.patch-csw
      the main changes
   ld-auto-import-dll.patch-csw2.gz
      the configure and config.in changes created by running
      autoconf and autoheader.

Any comments on the revised patch?  Is there a better way to handle the 
realpath()/REALPATH() thing?

2002-11-28  Ralf Habacker  <Ralf.Habacker@freenet.de>
	    Charles Wilson  <cwilson@ece.gatech.edu>

	* ld/config.in: regenerate
	* ld/configure: regenerate
	* ld/configure.in: add check for realpath function
	* ld/deffile.h: add .data field to def_file_import
	structure
	* ld/pe-dll.c (pe_proces_import_defs): use .data
	field of def_file_import structure to initialize
	flag_data field of def_file_export structure
	(pe_implied_import_dll): new variables exp_funcbase
	and [data|bss]_[start|end].  Use DLL's internal name
	to set dll_name, not filename (which may be a symlink).
	Scan the sections and initialize [data|bss]_[start|end].
	When scanning the export table, skip _nm_ symbols, and
	mark any symbols whose rva indicates that it is in the
	.bss or .data sections as data.
	* ld/sysdep.h: include limits.h and sys/param.h, and
	define LD_PATHMAX as appropriate.  Also define REALPATH
	as realpath if it exists, NULL otherwise
	* ld/emultempl/pe.em (gld_${EMULATION_NAME}_after_open):
	call pe_process_import_defs before pe_find_data_imports,
	so that auto-import will check the virtual implib as well
	as "real" implibs.
	(gld_${EMULATION_NAME}_recognized_file): use REALPATH to
	follow symlinks to their target; check that the target's
	extension is .dll before calling pe_implied_import_dll(),
	not the filename itself (which may be a symlink).

--Chuck
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ld-auto-import-dll.patch-csw
URL: <https://sourceware.org/pipermail/binutils/attachments/20021128/84ebb047/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ld-auto-import-dll.patch-csw2.gz
Type: application/x-gzip
Size: 4622 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/binutils/attachments/20021128/84ebb047/attachment.bin>


More information about the Binutils mailing list