CFA: pseudo-reloc v2

Christopher Faylor cgf-use-the-mailinglist-please@cygwin.com
Wed May 5 17:56:00 GMT 2010


On Wed, May 05, 2010 at 05:54:29PM +0100, Dave Korn wrote:
>[ redirected from cygwin-developers. ]
>On 04/10/2009 05:11, Charles Wilson wrote:
>[ thread seriously necro'd! ]
>> Dave Korn wrote:
>>> Charles Wilson wrote:
>>>>   120 void
>>>>   121 _pei386_runtime_relocator ()
>>>>   122 {
>>>>   123   static int was_init = 0;
>>>>   124   if (was_init)
>>>>   125     return;
>>>>   126   ++was_init;
>>>>   127   do_pseudo_reloc (&__RUNTIME_PSEUDO_RELOC_LIST__,&__RUNTIME_PSEUDO_RELOC_LIST_END__,&_image_base__);
>>>>   128 }
>>>   Maybe that static should be NO_COPY?  If everything gets remapped in the
>>> forkee, do the relocs need rerunning?  (I'm not sure about the behaviour of
>>> NtCreateProcess w.r.t modified .text section pages.)
>> 
>> Good guess!  With the following patch, all of these fork tests perform
>> as expected.
>
>  Aha, not so good as all that after all!  We need to re-apply relocs in the
>forkee - but only if they *don't* point to regions covered by the .data/.bss
>section copying at startup.  Argh!
>
>>  One oddity; it turns out that __INSIDE_CYGWIN__ is not
>> defined inside pseudo-reloc.c, so I used __CYGWIN__ as a guard.
>
>  Dunno if we ever went into that, but it's right; pseudo-reloc.o is part of
>the CRT in winsup/cygwin/lib/, and is linked statically into every exe and
>(user) dll, but is not part of the cygwin1 dll.  Hence not "inside Cygwin".
>
>  So, the attached is my proposed fix.  It resolves the problem reported on
>the main list the other day(*) and the supplied testcases all work once it's
>applied.  There are two things that people might want to change: the minor one
>is that I let a couple of the lines get a bit long, but no longer than we
>already have in the definition of NO_COPY at the top of the file, so I didn't
>wrap them for the sake of one trailing word.  The slightly bigger one is that,
>if I remember, the reason for having non-#if-CYGWIN code in the file at all is
>to make any potential future merges from upstream MinGW sources theoretically
>easier, but now that I've had to diverge the internal interfaces anyway, is
>there any reason not to just delete the whole lot?
>
>winsup/cygwin/ChangeLog:
>
>	lib/pseudo-reloc.c (memskip_t): New struct and typedef.
>	(__write_memory): Accept an optional memskip_t argument and avoid
>	writing to any memory ranges mentioned in the linked list.
>	(do_pseudo_reloc): Accept an optional memskip_t argument and pass
>	it through in all calls to __write_memory.
>	(_pei386_runtime_relocator): When reapplying relocs in a forked
>	child process, avoid doubly-relocating the .data and .bss sections
>	that were copied from the parent.
>
>    cheers,
>      DaveK
>-- 
>(*) - http://cygwin.com/ml/cygwin/2010-04/msg00957.html
>

>Index: winsup/cygwin/lib/pseudo-reloc.c
>===================================================================
>RCS file: /cvs/src/src/winsup/cygwin/lib/pseudo-reloc.c,v
>retrieving revision 1.4
>diff -p -u -r1.4 pseudo-reloc.c
>--- winsup/cygwin/lib/pseudo-reloc.c	26 Oct 2009 14:50:09 -0000	1.4
>+++ winsup/cygwin/lib/pseudo-reloc.c	5 May 2010 16:04:46 -0000
>@@ -78,6 +78,20 @@ typedef struct {
>   DWORD version;
> } runtime_pseudo_reloc_v2;
> 
>+/* This trivial struct is passed right down through do_pseudo_reloc
>+   to __write_memory where it is used to avoid re-relocating those
>+   memory areas that we know will have been pre-relocated by copying
>+   from the parent of a forked child process.  Since there will only
>+   ever be two ranges it's not worth worrying hugely about making it
>+   efficient so a simple singly-linked list will do; if we ever start
>+   encountering user applications with more than a few hundred or so
>+   pseudo-relocs, there might come a time to rethink this.  */
>+typedef struct memskip {
>+  DWORD start;
>+  DWORD end;
>+  const struct memskip *next;
>+} memskip_t;
>+
> static void ATTRIBUTE_NORETURN
> __report_error (const char *msg, ...)
> {
>@@ -169,7 +183,7 @@ __report_error (const char *msg, ...)
>  * is folded into the (writable) .data when --enable-auto-import.
>  */
> static void
>-__write_memory (void *addr, const void *src, size_t len)
>+__write_memory (void *addr, const void *src, size_t len, const memskip_t *skipranges)
> {
>   MEMORY_BASIC_INFORMATION b;
>   DWORD oldprot;
>@@ -177,6 +191,13 @@ __write_memory (void *addr, const void *
>   if (!len)
>     return;
> 
>+  while (skipranges)
>+    {
>+      if ((skipranges->start <= (DWORD)addr) && (skipranges->end > (DWORD)addr))
>+	return;
>+      skipranges = skipranges->next;
>+    }
>+
>   if (!VirtualQuery (addr, &b, sizeof(b)))
>     {
>       __report_error ("  VirtualQuery failed for %d bytes at address %p",
>@@ -198,7 +219,7 @@ __write_memory (void *addr, const void *
> #define RP_VERSION_V2 1
> 
> static void
>-do_pseudo_reloc (void * start, void * end, void * base)
>+do_pseudo_reloc (void * start, void * end, void * base, const memskip_t *skipranges)
> {
>   ptrdiff_t addr_imp, reldata;
>   ptrdiff_t reloc_target = (ptrdiff_t) ((char *)end - (char*)start);
>@@ -259,7 +280,7 @@ do_pseudo_reloc (void * start, void * en
> 	  DWORD newval;
> 	  reloc_target = (ptrdiff_t) base + o->target;
> 	  newval = (*((DWORD*) reloc_target)) + o->addend;
>-	  __write_memory ((void *) reloc_target, &newval, sizeof(DWORD));
>+	  __write_memory ((void *) reloc_target, &newval, sizeof(DWORD), skipranges);
> 	}
>       return;
>     }
>@@ -337,17 +358,17 @@ do_pseudo_reloc (void * start, void * en
>       switch ((r->flags & 0xff))
> 	{
>          case 8:
>-           __write_memory ((void *) reloc_target, &reldata, 1);
>+           __write_memory ((void *) reloc_target, &reldata, 1, skipranges);
> 	   break;
> 	 case 16:
>-           __write_memory ((void *) reloc_target, &reldata, 2);
>+           __write_memory ((void *) reloc_target, &reldata, 2, skipranges);
> 	   break;
> 	 case 32:
>-           __write_memory ((void *) reloc_target, &reldata, 4);
>+           __write_memory ((void *) reloc_target, &reldata, 4, skipranges);
> 	   break;
> #ifdef _WIN64
> 	 case 64:
>-           __write_memory ((void *) reloc_target, &reldata, 8);
>+           __write_memory ((void *) reloc_target, &reldata, 8, skipranges);
> 	   break;
> #endif
> 	}
>@@ -357,11 +378,57 @@ do_pseudo_reloc (void * start, void * en
> void
> _pei386_runtime_relocator (void)
> {
>+  /* We only want to apply the pseudo-relocs once, so we use this once-only
>+     guard variable - no need for complex serialisation or synchronisation
>+     here, as we're in early start-up (if an exe) or at process attach time
>+     (if a dll) and we'll be implicitly running single-threaded anyway.
>+
>+     However, when we fork a process, the OS creates fresh mappings of all
>+     the image files, so the pseudo-relocs all get wiped out and we need
>+     to reapply them; hence, the guard variable is NO_COPY, so that it
>+     starts from zero again in the forked child, and we apply the relocs
>+     again.  */
>   static NO_COPY int was_init = 0;
>+  /* But it isn't quite that simple.  During fork startup the parent and
>+     child co-operate to synchronize the memory: code in the Cygwin DLL
>+     copies across all the (non-read-only) data and bss sections of the
>+     exe and loaded dlls, not to mention heap and stack areas; this is
>+     how all the variables in the child end up with the same content as
>+     the parent, but it effectively pre-applies any pseudo-relocs that
>+     point into those regions for us, as their effect has been copied
>+     from the parent.  We need to avoid re-applying them when we fork
>+     or the data will end up doubly-relocated and pointing randomly into
>+     space, which is obviously a problem.  So we also have a once-only
>+     guard variable that does *not* use the NO_COPY attribute; this
>+     guard variable won't be reset on a fork but will remain set from
>+     the parent, letting us infer that we are re-applying pseudo-relocs
>+     in a child process rather than applying them for the first time
>+     in an entirely newly-created process.  */
>+  static char was_forked = 0;
>+  /* In that case, we want to avoid applying any pseudo-relocs that we
>+     know will already have been copied, pre-applied, from the parent's
>+     .data and .bss sections.  See the references to child_copy() in
>+     dcrt0.cc#child_info_fork::handle_fork() and fork.cc#frok::parent()
>+     for the details.  We take advantage here of the fact that this code
>+     is part of the winsup/cygwin/lib/ runtime library startup code,
>+     linked as a static object into each exe and dll rather than being
>+     part of the Cygwin DLL itself; this means we can simply look at the
>+     linker-supplied labels marking the start and end of our own .data
>+     and .bss sections to know which memory areas to avoid re-relocating,
>+     and don't have to worry about any complicated mechanism for the DLL
>+     to inform us which memory areas it copied.  Phew!  */
>+  extern char _data_start__, _data_end__, _bss_start__, _bss_end__;
>+  static const memskip_t skipchain[2] = {
>+    { .start = (DWORD)&_data_start__, .end = (DWORD)&_data_end__, .next = &skipchain[1] },
>+    { .start = (DWORD)&_bss_start__, .end = (DWORD)&_bss_end__, .next = 0 }
>+  };
>+
>   if (was_init)
>     return;
>   ++was_init;
>   do_pseudo_reloc (&__RUNTIME_PSEUDO_RELOC_LIST__,
> 		   &__RUNTIME_PSEUDO_RELOC_LIST_END__,
>-		   &__MINGW_LSYMBOL(_image_base__));
>+		   &__MINGW_LSYMBOL(_image_base__),
>+		   was_forked ? skipchain : NULL);
>+  was_forked = 1;
> }

I like the idea but I have a few problems with this, some stylistic and
some implementation.

Stylistic:

1) It should be "bool" rather than "char" for the variables that are boolean.

2) There are some minor GNU-formatting issues, although there are others in this
   file as well.

3) I don't see the need for a linked list when a null-terminated array would seem
   to be simpler.

4) It seems like you could just have one static variable which held a pointer to
   a "skipchain".  That would be NULL initially and set to the array after that for
   handling in the forked process.

Implementation:

I don't like keeping a list of "places we know we need to ignore" separate from
the Cygwin DLL.  That means that if there is something new added besides data/bss
this function becomes obsolete.

I think this argues for most of this functionality being moved to the
Cygwin DLL itself so that an application gets improvements for free.  I
should have suggested that when this function first made its way into
the libcygwin.a (or maybe I did and someone will enlighten me about why that
won't work).

I'll see I can come up with a proof-of-concept of what I'm talking about soon.

Btw, don't we have to worry about data/bss for DLLs too?  Is that
handled by this change or is that not an issue?

cgf



More information about the Cygwin-patches mailing list