This is the mail archive of the mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Rebasing dlls - why it is necessary


This is ctually a response to a thread on cygwin-apps, but I do not
subscribe to that list.
original subject: Re: rebase problem for cygcurl-2.dll still existing?!

> (BTW2: can anyone explain in layman's terms why it is that in-memory
> relocation upon collision doesn't happen in this case? Is that a
> deficiency of cygwin as a whole, or just related to the way my DLL was
> built?)

I looked into this when working on my gnome port. The problem only
occurs when a program that has dlopen()'ed a dll then does a fork(). The
forked child then has to dlopen() that dll, and have it located at the
same address as the parent had it loaded. The child cannot always
achieve this, because of the interaction of a number of issues.

issue #1:, Windows loads a dll in two stages. 

First it loads the dll *and all its dependency dlls* into memory. The
sequence that the dlls is loaded is: first the target dll, then the
dependent dlls (*top-down*, depth-wise). So for cygwin dlls, based at
the ld default of 0x10000000, the first will be re-located to the next
highest free address above 0x10000000, the next dll to the next free
address above that, and so on.

Second, Windows calls the entry point function for each loaded dll,
*bottom-up*. So no entry function is called until all dependent dlls are
loaded, and they are called in a different order from the dlls were
loaded to memory.

As cygwin relies on the entry function to add the dll to its list of
loaded dlls, and after a fork iterates over this list to load them into
the child, we can see that the order that the child loads them will not
in general be the same as the order that the parent loaded them. And as
Windows loads to the next-highest-available address, the child will get
different load addresses from the parent.

Now cygwin tries to overcome this problem by allocating the memory
blocks that Windows is trying to use, forcing it to load the dll higher.
This might work in most cases if it were not for issue #2

issue #2: cygwin maintains a small structure of data for each dll, and
allocates this structure contiguous with the loaded dll, or as close to
it as it can get. Because several dlls may have been allocated into
memory before this structure is allocated by the entry function, it is
not always placed next to its associated dll. But when the child loads
the dlls, in a different sequence from the parent as described above, it
generally will get to locate this structure next to its associated dll.
Now the next dll cannot be loaded to this address because it is already
occupied by the cygwin structure.

This whole scenario is complicated by the fact that dlls can be
unloaded, and some do not use the default base address.

So the chances of a child process being able to relocate dlopen()'ed
dlls to the same address as its parent are actually quite slim, unless
the dlls are constructed in such a way that Windows does not need to
relocate. Jason's rebase.exe achieves this by ensuring that dlls have
base addresses that do not overlap, and are separated by enough empty
space to allow the cygwin dll structure to fit in between.

This can be seen as a hack, because rebasing should really be for
efficiency, not for functionality. But the only other solution I can
imagine is a complete re-working of the way cygwin handles dynamically
loaded dlls and fork() - and I guess this is not likely in a hurry. As a
long-shot, maybe the work that Egor is doing on runtime re-location for
auto-export of data symbols could be extended so that cygwin no longer
needs dlls to be loaded at absolute addresses?


Unsubscribe info:
Bug reporting:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]