More (?) steps toward jemalloc within Cygwin DLL

Ford, Brian
Tue Jul 21 22:06:03 GMT 2020

FWIW, we found Intel TBB malloc ( to be necessary for our native multi-threaded Windows app performance up until Windows 10 based OS's when the native heap became competitive (and maybe even slightly better).  

-----Original Message-----
From: Cygwin-developers [] On Behalf Of Mark Geisert
Sent: Tuesday, July 21, 2020 3:51 AM
Subject: Re: More (?) steps toward jemalloc within Cygwin DLL

CAUTION EXTERNAL EMAIL: Verify sender, links, and attachments are safe before taking action.

Corinna Vinschen wrote:
>>> If you get jemalloc working, it would be nice in itself, but the 
>>> main improvement would be the ability to get rid of these 
>>> __malloc_lock/ __malloc_unlock brackets.
>> Thanks for reminding me of that aspect of Cygwin's current malloc.  
>> The malloc implementation has seemed to be bulletproof for many years 
>> so I guess the function-level locking is the only drawback of note?
> Not quite.  It's bad enough, given how much this slows down 
> multi-threaded executables, but...
> ...the big problem are dependencies on malloc during Cygwin startup, 
> especially in fork/exec, so the real challenge is to get the new 
> malloc still let Cygwin processes start up correctly first time and 
> especially in fork/exec situations, and to make sure the malloc 
> bookkeeping survives fork/exec.

O.K., understood.

> These malloc dependencies sometimes crop up in the weirdest 
> situations, so that's something to look out for.  For instance, using 
> pthread functions may call malloc as well.  If a problem can be solved 
> by changing another part of Cygwin, don't hesitate to discuss this!

Yes, a couple of the malloc packages I'm testing want to allocate locks and TLS slots right off the bat so there's nasty recursion possible.

>> I've switched to a
>> plug-in sort of implementation that allows one to choose among 
>> several malloc packages: "original", dlmalloc (w/ internal locking), 
>> ptmalloc[23], nedalloc, jemalloc, and a Windows Heap wrapper.  
>> Perhaps tcmalloc in the future.  One sets an environment variable 
>> CYGMALLOC=<name> before launching a program and that malloc 
>> implementation is used.  This should make testing and benchmarking 
>> the various choices possible.  I don't expect big improvements in 
>> individual programs (unless they are stress testing), but something like a large configure or build should give more useful data.
> In the end, we should settle for a single malloc implementation, though.
> It doesn't really matter if it's jemalloc, ptmalloc, xymalloc.  Almost 
> all other modern mallocs are faster and better suited for 
> multi-threading than dlmalloc, *especially* if the above locks can go away.

For sure; I didn't make it clear this CYGMALLOC setup is just for testing the different malloc packages.  When I stumble across some failing in one of them it's nice to be able to quickly re-run using a different malloc.

Here's a question I didn't expect to come up: If it turns out a home-grown wrapper on the Win32 HeapXXX functions performs better (hint: it does, 2.5 to 3 times better) than any malloc package derived from dlmalloc, is there any reason why we ought not use it?  Assuming it can be made to work for all those cases you mentioned above, of course.

> The only danger here is this: If you manage to get dlmalloc replaced 
> reliably, you *will* get a pink plush hippo!

Oh, gee, that sounds like a really nice reward... Wow, I'm gonna have to do this project now for sure!


More information about the Cygwin-developers mailing list