Thread memory allocation issue
Teepean
stinkf42@yahoo.com
Tue Nov 19 06:59:19 GMT 2024
> Given that the result of the investigation a couple years ago was,
> essentially, no change to Cygwin's malloc*, why has the problem
> manifested again recently? Have you been benchmarking/testing all
> along? Can you be more specific about which recent Cygwin versions?
The original executable with patches was compiled around 2021 using Cygwin the current GCC and Cygwin at that time. This version does work without problems. There have been several changes to Cygwin after that time so I decided compiling bwa using current Cygwin (3.5.4) to see if it would work without patching and I found that it did not. I compiled a patched version (GCC versions 12.0 - 15.0) and noticed that it did not work either. I then tested the 2021 compiled version and noticed that executable still works.
>> Steps to Reproduce
>>
>> 1. Compile BWA normally
>>
>> https://github.com/lh3/bwa/
>
> What's involved with that? Clone the repo, ./configure, make? Anything else?
git clone https://github.com/lh3/bwa/
cd bwa
make
>> 2. Compile BWA with rpmalloc and the following patch:
>>
>>
>> // In thread worker function:
>> #ifdef __CYGWIN__
>> rpmalloc_thread_initialize();
>> #endif
>>
>>
>> // ... thread work ...
>> #ifdef __CYGWIN__
>> rpmalloc_thread_finalize(1);
>> #endif
> Where does that patch go? Assume I know nothing about BWA.
These two files are patched:
https://github.com/WGSExtract/bwa/blob/cygwin/main.c
https://github.com/WGSExtract/bwa/blob/cygwin/kthread.c
> Are these examples of runs one would do "in production"? Or are you
> running much longer-lasting processing in the usual case?
Normal production samples are usually gigabytes in size whereas the testcase has a sample that is only around 20 megabytes but even with a small sample like this it is possible to benchmark the problem. So a production sample of human DNA that might be 60 gigabytes in size and would take Linux version of bwa about three hours would take 24 hours or more on an unpatched bwa on Cygwin.
>> 1. The default malloc implementation shows extremely high system time (11.265s) compared to the rpmalloc version (0.327s)
>> 2. Total real time is about 4.5x slower with default malloc
>> 3. The dramatic difference in system time suggests heavy contention in the memory allocation subsystem
>> 4. The issue only manifests on Cygwin with bwa; the same code performs normally on native Linux and MacOS
>
> Are you saying there is non-bwa code that runs on Cygwin comparably to
> Linux and Mac?
I think most code runs nearly as fast on Cygwin as it does on Linux, assuming the code does not heavily rely on disk I/O.
>> 5. The issue manifests with recent versions of Cygwin but does work with older versions
>
> Again, it would really help if you could give Cygwin versions or at
> least dates here...
The patch for Cygwin version was committed Mar 28, 2021.
> I'll glance at this stuff when I can but I hope to have some answers to
> my questions above from you to save me some time.
Thank you! I know that there are only a handful of people using bwa on Cygwin but I would guess that this is a good test case to learn something about Cygwin's malloc.
Regards,
Teemu Nätkinniemi
More information about the Cygwin
mailing list