This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Simplify strncat.
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Thu, 18 Dec 2014 21:39:01 -0200
- Subject: Re: [PATCH] Simplify strncat.
- Authentication-results: sourceware.org; auth=none
- References: <20141216202438 dot GA5612 at domone> <20141216203638 dot D237C2C2448 at topped-with-meat dot com> <20141217163206 dot GA26604 at domone> <5491CBFD dot 60503 at linux dot vnet dot ibm dot com> <20141218195941 dot GB29110 at domone>
On 18-12-2014 17:59, OndÅej BÃlka wrote:
> On Wed, Dec 17, 2014 at 04:31:25PM -0200, Adhemerval Zanella wrote:
>> On 17-12-2014 14:32, OndÅej BÃlka wrote:
>>> On Tue, Dec 16, 2014 at 12:36:38PM -0800, Roland McGrath wrote:
>>>> Use '\0', not '\000'.
>>>>
>>>> IIRC we have a general policy about having a benchtests case and citing
>>>> numbers on that.
>>> That is not policy as benchtest results on string function are still
>>> meaningless. Here I could simply improve benchtest score by inlining
>>> strnlen and memcpy implementations, for example with
>>>
>>> #define memcpy memcpy2
>>> #include <string/memcpy.c>
>>>
>>> As it avoids call overhead. However it would be big mistake to do that
>>> "optimization" as strncat is cold function while memcpy is likely in
>>> memory it degrade performance due to additonal icache misses.
>>>
>> This is a fair criticize, but benchtests from string functions are far for
>> 'meanigless'. On powerpc side, for instance, I noted unaligned cases
>> that current code for strcpy was outperforming. It leads to check more
>> result using different workloads and profiler and code a better strategy.
>> Which resulted in a better implementation in the end (patch just posted).
>>
> Which is quite dangerous, if benchmark is unreliable you cannot be sure
> if that improvement is real or will turn into regression. That could now
> easily be case as function in benchtest predicts all branches perfectly
> which makes path fast in benchtest does not happen in reality. I will post
> several changes and we will see truth (I did not read patch so I cannot say
> what will happen.)
>
And that is *exactly* why we should push for more patches and a better coverage
in benchmarks. I do see your criticizes as positive, what I do not see is just
bashing someone that use the benchmarks it without providing anything better
instead.
And do not get me wrong here, but I checked sometime ago your personal page
and your benchmark work and I lost interested because it was *way* confusing
and without much organization. And that is exactly what I am trying to avoid
here: comments in separate threads with meaningful comments, but lost without
proper organization.
For instance, I noticed you used x86_64 specific instructions to flush the
cache and other things to try mimic a real world usage. I think is a good idea
try to came up with a benchmark that may use some arch-specific, but we will
need some discussion on how to characterize it, how to code for multiple archs,
etc.