malloc crash

Mark Geisert mark@maxrnd.com
Tue Oct 26 08:59:36 GMT 2021


Takashi Yano wrote:
> On Tue, 26 Oct 2021 01:30:13 -0700
> Mark Geisert wrote:
>> Replying to myself to correct something I wrote...
>>
>> Mark Geisert wrote:
>>> Takashi Yano wrote:
>>>> On Mon, 25 Oct 2021 16:36:50 -0700
>>>> Mark Geisert wrote:
>>>>> Ken Brown wrote:
>>>>>> On 10/25/2021 5:29 PM, Mark Geisert wrote:
>>>>>>> Corinna Vinschen wrote:
>>>>>>>> Er... huh?  So both threads are in a malloc function?  This shouldn't
>>>>>>>> have happened, given the clunky muto guarding malloc calls.  This is
>>>>>>>> really strange.  Why's the muto not working here?
>>>>>>>
>>>>>>> Is it possible both threads have executed malloc_init()?
>>>>>>> If so, the second one would reinit the muto.
>>>>>>
>>>>>> Or does the fifo_reader thread call a malloc function before the main thread has
>>>>>> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but
>>>>>> there's no error check.
>>>>>
>>>>> If there's a global constructor involved, that is known to happen.  Constructors
>>>>> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See
>>>>> dcrt0.cc for the details.
>>>>
>>>> So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0()
>>>> so that malloc() can be called in fixup_after_fork/exec()?
>>>
>>> It appears simple, but this is a touchy area of code.  The _0 and _1 are two
>>> separate phases of process startup.  I'd want to hear Corinna's thoughts on this.
>>>
>>> I'd also like to verify somehow that this is the scenario Ken is hitting.
>>>
>>> When I was researching different mallocs for Cygwin I hit the constructor snag
>>> repeatedly.  I did try delaying the constructor-running until after malloc_init().
>>>    More problems.  I did not try moving malloc_init() to before the constructor run.
>>
>> Apologies; this was many months ago.  What I did try was moving the malloc_init()
>> to before running the constructor chain, as Takashi suggested.  That is what gave
>> me more problems.  I don't recall what they were, but I reverted that attempt.
>>
>> The "future malloc" build of Cygwin I'm running doesn't exhibit Ken's issue, as
>> far as I can tell.  It has a specific fix to avoid the scenario I've been talking
>> about here, but I don't want to take us down that path unless we're sure Ken's
>> hitting that same scenario.
> 
> I tried the following patch, and confirmed that the issue has
> been disappeared. I do not notice any other problems so far
> with this patch.
> 
> diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
> index 6f4723bb0..0d541ec14 100644
> --- a/winsup/cygwin/dcrt0.cc
> +++ b/winsup/cygwin/dcrt0.cc
> @@ -773,6 +773,10 @@ dll_crt0_0 ()
>     do_global_ctors (&__CTOR_LIST__, 1);

       ^^^^^^^^^^^^^^^

>     cygthread::init ();
>   
> +  /* malloc_init() has been moved from dll_crt0_1() to here so that
> +     malloc() can be called in fixup_after_exec(). */
> +  malloc_init ();
> +
>     if (!child_proc_info)
>       {
>         setup_cygheap ();
> @@ -857,7 +861,7 @@ dll_crt0_1 (void *)
>        on a functioning malloc and it's possible that the user's program may
>        have overridden malloc.  We only know about that at this stage,
>        unfortunately. */
> -  malloc_init ();
> +  /* malloc_init() has been moved to dll_crt0_0(). */
>     user_shared->initialize ();
>   
>   #ifdef CYGHEAP_DEBUG
> 
> 
> Where is the "constructor chain" you mentioned?

See above.  Try moving your new lines above the call to do_global_ctors().  Also 
note the comment just above the original location of those lines.. you're now 
ignoring what the comment warns about.

..mark


More information about the Cygwin-developers mailing list