Dave Korn
Tue Mar 7 15:34:00 GMT 2006

On 07 March 2006 14:33, Corinna Vinschen wrote:

> On Mar  7 13:25, Dave Korn wrote:

>>   I got one of those SEGV-on-pressing-ctrl-C bugs, and fortunately
>> error_start jumped in and grabbed it.  

>> ----------------------------------<snip>----------------------------------
>>   Looking at the thread data, it appears well formed, but the callback
>> address in the cb member of the one and only entry on the pthread_child
>> list is plainly wrong. 
>> ----------------------------------<snip>----------------------------------
>> [...] (gdb) print *__cygwin_user_data->threadinterface->pthread_child
>> $5 = {cb = 0x61bab0 <riscos1_wctomb+70640>, next = 0x0}
> It certainly looks like a valid address.  Do you know which callbacks
> are installed by the application?  You could also add some debug_printfs
> to see which values are to be expected in the normal case.

  Ah.  Here are the mappings of the two dlls in question:

  Object file: /usr/bin/cygintl-3.dll
    0x10001000->0x10007314 at 0x00000400: .text ALLOC LOAD READONLY CODE DATA
    0x10008000->0x10008040 at 0x00006800: .data ALLOC LOAD DATA HAS_CONTENTS
    0x10009000->0x100094f0 at 0x00000000: .bss ALLOC
    0x1000a000->0x1000a524 at 0x00006a00: .edata ALLOC LOAD READONLY DATA
    0x1000b000->0x1000b750 at 0x00007000: .idata ALLOC LOAD DATA HAS_CONTENTS
    0x1000c000->0x1000c454 at 0x00007800: .reloc ALLOC LOAD READONLY DATA
  Object file: /usr/bin/cygiconv-2.dll
    0x00541000->0x0062c074 at 0x00000400: .text ALLOC LOAD READONLY CODE DATA
    0x0062d000->0x0062d020 at 0x000eb600: .data ALLOC LOAD DATA HAS_CONTENTS
    0x0062e000->0x0062e3f0 at 0x00000000: .bss ALLOC
    0x0062f000->0x0062f155 at 0x000eb800: .edata ALLOC LOAD READONLY DATA
    0x00630000->0x006303e4 at 0x000eba00: .idata ALLOC LOAD DATA HAS_CONTENTS
    0x00631000->0x00631cc8 at 0x000ebe00: .reloc ALLOC LOAD READONLY DATA

  So, the callback struct is in random memory just after the allocation for

>(gdb) print __cygwin_user_data->threadinterface->pthread_child
>$4 = (callback *) 0x100102b0

and it points to an address in the .text section of cygiconv-2:

> (gdb) print *__cygwin_user_data->threadinterface->pthread_child
> $5 = {cb = 0x61bab0 <riscos1_wctomb+70640>, next = 0x0}

however that address is random nonsense:

 [ ... snip tonnes of similar ... ]
0x0061baa6 <riscos1_wctomb+70630>:      add    %al,(%eax)
0x0061baa8 <riscos1_wctomb+70632>:      push   %esi
0x0061baa9 <riscos1_wctomb+70633>:      add    %al,(%eax)
0x0061baab <riscos1_wctomb+70635>:      add    %cl,0x0(%ecx)
0x0061baae <riscos1_wctomb+70638>:      add    %al,(%eax)
0x0061bab0 <riscos1_wctomb+70640>:      dec    %ecx
0x0061bab1 <riscos1_wctomb+70641>:      add    %al,(%eax)
0x0061bab3 <riscos1_wctomb+70643>:      add    %cl,0x0(%ecx)
0x0061bab6 <riscos1_wctomb+70646>:      add    %al,(%eax)
0x0061bab8 <riscos1_wctomb+70648>:      add    (%eax),%al
0x0061baba <riscos1_wctomb+70650>:      add    %al,(%eax)
0x0061babc <riscos1_wctomb+70652>:      dec    %ecx
0x0061babd <riscos1_wctomb+70653>:      add    %al,(%eax)
0x0061babf <riscos1_wctomb+70655>:      add    %bl,0x0(%eax)
0x0061bac2 <riscos1_wctomb+70658>:      add    %al,(%eax)
0x0061bac4 <riscos1_wctomb+70660>:      add    %eax,(%eax)
0x0061bac6 <riscos1_wctomb+70662>:      add    %al,(%eax)
0x0061bac8 <riscos1_wctomb+70664>:      pop    %eax
0x0061bac9 <riscos1_wctomb+70665>:      add    %al,(%eax)
0x0061bacb <riscos1_wctomb+70667>:      add    %al,(%edx)
0x0061bacd <riscos1_wctomb+70669>:      add    %al,(%eax)
0x0061bacf <riscos1_wctomb+70671>:      add    %bl,0x0(%eax)
0x0061bad2 <riscos1_wctomb+70674>:      add    %al,(%eax)
0x0061bad4 <riscos1_wctomb+70676>:      dec    %ecx
 [ ... snip tonnes of similar ... ]

  Right.  I reckon this means that cygintl-3 set a pthread_fork callback
pointing to a routine in cygiconv-2.  Then, when the process got forked, the
dlls got reloaded into a different location in the forked child as compared to
where it was in the parent, and so the pthread_fork callback that had been
installed in the parent ended up referring to an invalid address in the child.

  Gottit.  On inspection, it turns out that I've got old unrebased versions
that both have the default ImageBase address of 0x10000000.  That makes even
more sense of it.

  That means that 1) it can probably be mitigated with rebase(all) and 2) the
only thing that's worth doing is to work on the generic problem of dlls not
being reloaded at the same address in the child as they are loaded at in the
parent; there's no point trying to watch out for and correct the address in
the callback, because there's a million other things that were going to go
wrong anyway.  The only slightly surprising thing is that it got far enough to
run into this problem, rather than failing at the mapviewofsection call or
some other earlier point.

  Thanks for helping me think out loud!

Can't think of a witty .sigline today....

More information about the Cygwin-developers mailing list