Thread memory allocation issue
Teepean
stinkf42@yahoo.com
Sun Nov 17 19:32:58 GMT 2024
Hello!
I raised this issue couple of years ago on cygwin-developer but now when the problem has manifested again with recent versions of Cygwin I decided to post this to general discussion list.
Steps to Reproduce
1. Compile BWA normally
https://github.com/lh3/bwa/
2. Compile BWA with rpmalloc and the following patch:
// In thread worker function:
#ifdef __CYGWIN__
rpmalloc_thread_initialize();
#endif
// ... thread work ...
#ifdef __CYGWIN__
rpmalloc_thread_finalize(1);
#endif
3. Run both versions with the following command:
time ./bwa mem -t 11 chr19_KI270866v1_alt.fasta test_1.fastq test_2.fastq > testorigsingle.sam
Without Patch (Default malloc):
[M::mem_process_seqs] Processed 120000 reads in 30.296 CPU sec, 3.743 real sec
[main] Real time: 3.883 sec; CPU: 30.436 sec
real 0m3.907s
user 0m19.186s
sys 0m11.265s
With Patch (rpmalloc):
[M::mem_process_seqs] Processed 120000 reads in 7.530 CPU sec, 0.702 real sec
[main] Real time: 0.830 sec; CPU: 7.640 sec
real 0m0.868s
user 0m7.343s
sys 0m0.327s
Analysis
1. The default malloc implementation shows extremely high system time (11.265s) compared to the rpmalloc version (0.327s)
2. Total real time is about 4.5x slower with default malloc
3. The dramatic difference in system time suggests heavy contention in the memory allocation subsystem
4. The issue only manifests on Cygwin with bwa; the same code performs normally on native Linux and MacOS
5. The issue manifests with recent versions of Cygwin but does work with older versions
The issue becomes more pronounced with higher thread counts
The patched code is located here in branch Cygwin:
https://github.com/WGSExtract/bwa.git
Simple testsuite. Run bash testsuite.sh. The testsuite includes a version compiled with an older version of Cygwin called bwa_working.exe
https://drive.google.com/file/d/1jtbQVUAcCmpJM-8Exi0C6pzDXcEo4cV6/view?usp=drive_link
Regards,
Teemu Nätkinniemi
More information about the Cygwin
mailing list