This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][AArch64] Single thread lowlevellock optimization

From: Torvald Riegel <triegel at redhat dot com>
To: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>, nd at arm dot com
Date: Tue, 20 Jun 2017 15:47:45 +0200
Subject: Re: [PATCH][AArch64] Single thread lowlevellock optimization
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=triegel at redhat dot com
Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com A88A14025E
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A88A14025E
References: <59440699.6080900@arm.com>

On Fri, 2017-06-16 at 17:26 +0100, Szabolcs Nagy wrote:
> Do single thread lock optimization in aarch64 libc. Atomic operations
> hurt the performance of some single-threaded programs using stdio
> (usually getc/putc in a loop).
> 
> Ideally such optimization should be done at a higher level and in a
> target independent way as in
> https://sourceware.org/ml/libc-alpha/2017-05/msg00479.html
> but that approach will need more discussion so do it in lowlevellocks,
> similarly to x86, until there is consensus.

I disagree that this is sufficient reason to do the right thing here
(ie, optimize in the high-level algorithm).  What further discussion is
needed re the high-level use case?

> Differences compared to the current x86_64 behaviour:
> - The optimization is not silently applied to shared locks, in that
> case the build fails.
> - Unlock assumes the futex value is 0 or 1, there are no waiters to
> wake (that would not work in single thread and libc does not use
> such locks, to be sure lll_cond* is undefed).
> 
> This speeds up a getchar loop about 2-4x depending on the cpu,
> while only cause around 5-10% regression for the multi-threaded case

What measurement of what benchmark resulted in that number (the latter
one)?  Without details of what you are measuring this isn't meaningful.

> (other libc internal locks are not expected to be performance
> critical or significantly affected by this change).

Why do you think this is the case?

Follow-Ups:
- Re: [PATCH][AArch64] Single thread lowlevellock optimization
  - From: Szabolcs Nagy

References:
- [PATCH][AArch64] Single thread lowlevellock optimization
  - From: Szabolcs Nagy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]