This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: pthread_mutex_unlock takes 15x longer than pthread_mutex_lock


Hi,

At least when there is no contention on the lock, the important part of
both the lock and unlock on x86 is an atomic operation to update the lock
state.

The time that an atomic operation takes on x86 is highly variable. I
believe that this is because pending data in processor write buffers are
flushed into the memory hierarchy by the atomic operation.

I am guessing that your application does a memcpy() or similar of the
packet before the unlock, and flushing that out of the write buffers is
causing the timing you observe.

(Incidently, I learnt about this stuff because I too observed very similar
behavior in a packet processing application.)

One possible mitigation is to use a pthread spinlock instead of a pthread
mutex; the former is released (on x86) by a normal write, not an atomic op.
This will make the unlock faster (but possibly take longer for the unlock
to be observed by other threads).

Cheers,
Ralph.


On Mon, Oct 2, 2017 at 9:20 AM, Akash Baid <akashbaid@gmail.com> wrote:

> Hi,
>
> I have a packet processing application which takes and releases a
> pthread_mutex_lock for every packet (around 200k times a second). When
> profiling the performance through Intel VTune, I noticed that the unlock
> operation takes about 370ns compared to only 25ns for the lock operation.
> The numbers are averaged over millions of operations and are independent of
> the size of the critical section.
>
> I suspected lock contention to be playing a role here and thus have ensured
> that no other thread accesses this lock but the numbers still hover around
> the same mark. Any ideas as to whether this is expected behavior and if not
> what might be going on? My understanding was that the unlock operation
> under no contention should be free of system calls and extremely fast. In
> fact single stepping through the instructions suggest that there aren't any
> system calls but maybe the LOCK assembly instruction inside the low-level
> lock lll_unlock routine is taking long in the unlock case for some reason?
>
> The application is using libpthread dynamic shared library, glibc
> version 2.13 on a x86_64 GNU/Linux VM on an Intel Xeon E5 2.5Ghz processor
> machine.
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]