This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PING][PATCHv3 1/2] aarch64: Hoist ZVA check out of the memset function



On 11/10/2017 19:21, Wilco Dijkstra wrote:
> Adhemerval Zanella wrote:
> 
>> My idea is to prevent have different code paths in different files that
>> are 'injected' by macros or ifdefs.  Getting all of them in one place is
>> better imho than spread over multiple files.  Now, what you are advocating
>> is a different topic: whether the modifications of the generic
>> implementation are valuable.
> 
> Well I'm talking about the patch as proposed. I can no longer understand
> the order or the logic, and it is impossible to figure out which of the many
> possible cases may have changed alignment of the code - whether by
> accident or on purpose.

Right, that is a fair point. Which alternative do prefer for this kind of
change? Have different implementations with a more linear code (and
potentially duplicate them) or a common implementation with macros
on each file?

> 
>> Ideally it would prefer to have a more concise selection as:
>>
>>    1. A memset that reads dczid_el0 using mrs (as the default one).
>>       This will be selected as default and thus should not incur in any
>>       regression.
>>
>>    2. A memset that reads the zva size from global variable and is selected
>>       solely by falkor (and ideally it would handle 64 and 128 cacheline
>>       sizes).
>>
>>    3. no zva variant in the case of zva being 0.
> 
> I don't see how that makes sense. If we decide to use an ifunc, then you
> no longer need to check the ZVA size. So I just don't understand why we
> need a global at all...

This is taking in consideration your point about possible regression, since
the first options would be essentially what we have.  But I do think
Siddhesh proposal to be a good way forward and I can't see how it might
regress on current hardware since it is in fact removing the requirement
to get the zva code (and thus cycles to adjust the loop base on it). 
So what kind of regressions do you have in mind with current proposal?

Siddhesh also stated that it improves for both different micro-architectures 
(it also handle the cases where configuration does not advertise zva size).
I will try to run this patch with the hardware I have access here to check for
possible regressions, but I recall I already tested on APM X-GENE and it showed
no regression.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]