pthread_cond performence Discussion

Carlos O'Donell carlos@redhat.com
Wed Mar 18 12:12:43 GMT 2020


On 3/16/20 3:30 AM, liqingqing wrote:
> The new condvar implementation that provides stronger ordering
> guarantees. For the waiters's ordering without expand the size of the
> struct of pthread_cond_t, It uses a little bits to maintain the state
> machine which has two different start group G1 and G2. This algorithm
> is very cleverly. But when I test MySQL performance and found that
> this new condvar implementation will affect the performance when
> there are many cores in one machine. the scenario is that in my arm
> server, test 200 terminals to read and write the database in 4P
> processor environment(totally 256 cores), and I found that It can get
> better performance when I use the old algorithm. 

Are you able to look at any hardware performance counters to see if
there are increased cache line miss rates?

> I think maybe there has too many cache false sharing when in my
> environment. Does anyone has the same problem? And is there room for
> optimization about the new algorithm?

I have not seen anyone report a performance problem on large machines.

Unfortunately from an ABI perspective we cannot increase the size of
the structure, nor change the required alignment.

We may be able to play with the order of the layout of elements
within the condvar. That's something you could experiment with and
report back to the list with your findings.

For example:
- Make changes the layout by moving elements around to attempt to
  avoid cache-line sharing.
- Recompile glibc.
- Install into your system.
  - PTHREAD_COND_INITIALIZER should be all-zero bytes so you should
    not need to recompile applications.
- Retest performance.

-- 
Cheers,
Carlos.



More information about the Libc-alpha mailing list