Extreme slowdown due to malloc?

Achim Gratz Stromeko@nexgo.de
Mon Dec 21 20:52:26 GMT 2020


I've been experimenting a bit with ZStandard dictionaries.  The
dictionary builder is probably not the most optimized piece of software
and if you feed it large amounts of data it needs quite a lot of
cycles.  So I thought I run some of this on Cygwin since that machine is
faster and has more threads than my Linux box.  Unfortunately that plan
shattered due to extreme slowness of the first (single-threaded) part of
the dictionary builder that sets up the partial suffix array.

|------+---------------+---------------|
|      | E3-1225v3     | E3-1276v3     |
|      | 4C/4T         | 4C/8T         |
|      | 3.2/3.6GHz    | 3.6/4.0GHz    |
|------+---------------+---------------|
|  100 | 00:14 /   55s | 00:23 /  126s |
|  200 | 00:39 /  145s | 01:10 /  241s |
|  400 | 01:12 /  266s | 01:25 /  322s |
|  800 | 02:06 /  466s | 11:12 / 1245s |
| 1600 | 03:57 /  872s | > 2hr         |
| 3200 | 08:03 / 1756s | n/a           |
| 6400 | 16:17 / 3581s | n/a           |
|------+---------------+---------------|

The obvious difference is that I/O takes a lot longer on Cygwin (roughly
a minute for reading all the data) and that I have an insane amount of
page faults on Windows (as reported by time) vs. none on Linux.

While doing that I also noticed that top shows the program taking 100%
CPU in the multithreaded portion of the program, while it should show
close to 800% at that time.  I'm not sure if that information just isn't
available on Windows or if procps-ng needs to look someplace else for
that to be shown as expected.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Waldorf MIDI Implementation & additional documentation:
http://Synth.Stromeko.net/Downloads.html#WaldorfDocs


More information about the Cygwin-apps mailing list