This is the mail archive of the mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

The bug in ld: some results:

Jim was kind enough to send me two executables, named GOOD.EXE and BAD.EXE,
that contain this now famous 'lkernel32' in the command line 'ld' bug.

I decided to investigate further this problem, and to present you this 
results. Maybe this information will be useful for you and for the poor guy(s) 
trying to maintain 'ld'. I really feel sorry for them ! :-)

The main differences between the executables are the following:
1. Good.exe has the imports entries from cygwin.dll FIRST, then follow the
   cygwin.dll entries.
2. Bad.exe has the imports entries from kernel32 first, then cygwin.dll.

This explains the observed behaviour of command line/not command line crashes.
It seems that the order which ld uses is very significant. Why?

In BAD.EXE the entry for the kernel32.dll imports is WRONG: it is missing
the import for GetCommandLine !!!!! Since that import is wrong, 
the program will crash at startup, when that function is called. 
This explains the behaviour Jim observed under the debugger.

The entry for GetModuleHandle is present, and is the only entry for kernel32.

The entry for GetCommandLine is not entirely missing: just
its address is missing in the import table. Its Ascii name is there, but the 
loader will never find it, because after the entry of 'GetModuleHandle' there
is a NULL.

I think here a little explanation is required:

The import table begins with an array of IMAGE_IMPORT_DESCRIPTOR's for each
dll used by the program. There is one of them for each dll.
This IMAGE_IMPORT_DESCRIPTORs contain a field that points to the data needed
by the loader for each function in the corresponding dll. This is OK in 
BAD.EXE. So far so good.

The information pointed to by the IMAGE_IMPORT_DESCRIPTORS is an array of
IMAGE_THUNK_DATA: one for each imported function from the corresponding dll. 
This arrays are finished by a NULL entry.

This IMAGE_THUNK_DATA structures contain an RVA (relative) pointer to the
names of the functions. (Another array).

  | import descriptor |______________ Thunk data for function 1
  |       dllNr1      |                of first dll --------------->Ascii name
  --------------------                Thunk data for function 2
                                       of first dll --------------->Ascii name

  | import descriptor |______________ Thunk data for function 1
  |       dllNr2      |                of second dll --------------Ascii name

In BAD.EXE there is a correct entry for GetModuleHandle, and its 
u1.AddressOfData field points correctly to an ascii string 'GetModuleHandle'.
The problem is, that the next entry in the IMAGE_THUNK_DATA contains a NULL
instead of an entry for the next function imported from kernel32.dll:

This NULL is interpreted by the loader as the sign for the
end of the table and it will never get to the ASCII string 'GetModuleHandle',
that is there, in the import table, even if its NOT WHERE IT SHOULD BE.

That ASCII string is at the END of the ASCII strings of the OTHER dll the
program uses: cygwin.dll. 

Instead of following immediately the string of GetModuleHandle, The ascii
string got somehow at the end of another completely unrelated DLL!!!!!

This means that 'ld' mixes up the ASCII
strings for the functions imported by the dlls, what could prepare bad
surprises to users that call one function in their source code, and end up 
calling something completely different at run time!!!

Why does 'ld' crash?
As somebody from Cygnus pointed out in this thread, I do not know much
about ld. But to write a linker I was forced to learn something about this
dammed table, the most difficult part of the whole linking process. There
are three possibilities:

1) The import library for kernel32.dll has a bug that confuses 'ld'.
2) 'ld' has a bug independent of the libraries.
3) Both 'ld' and the import libraries are buggy.

Let's examine the pros/cons of each possibility.

1) The import library is buggy.
  If we assume this, we would have to explain why the same import library
  works if it is not the first import library and specified in the command

2) 'ld' has a bug that is library independent.
   This is highly probable since it would explain all observed behaviour.

3) Both DLLTOOL and ld are buggy.
   This is a REAL possibility.

Where to look in 'ld' sources?
'ld' sorts the sections (as my linker does) to accomodate the order of the
subsections (idata$2 should come before idata$3). Is this sorting done before
or after the default libraries are loaded? The mess in ld could be the result
of sorting after some command line libraries were loaded already... I think
this is the most promising avenue of investigation.

The 'not owner' bug in 'ld'.

When I studied 'ld', I remarked the following problem. If you try to link
an object file generated with MSVC using 'ld', I saw that the executable
image contained a HOLE, i.e. it wasn't contiguous. Could somebody that has
an executable that has that 'not owner' bug look at the sequence of the
sections in memory? If they are NOT CONTIGUOS, the NT loader will refuse
to load the program. I would be interested to know if that is the case.
I think that the Windows95 loader is more liberal, and may load the executable
with holes and all... This could explain some things.

Sorry for this long message

Jacob Navia	Logiciels/Informatique
41 rue Maurice Ravel			Tel 01
93430 Villetaneuse 			Fax 01
For help on using this list, send a message to
"" with one line of text: "help".

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]