This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: HWCAP is method to determine cpu features, not selection mechanism.





> On Jun 10, 2015, at 11:09 PM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> 
>> On Wed, Jun 10, 2015 at 11:21:54AM -0300, Adhemerval Zanella wrote:
>> 
>> 
>>> On 10-06-2015 11:16, Szabolcs Nagy wrote:
>>>> On 10/06/15 14:35, Adhemerval Zanella wrote:
>>>> I agree that adding an API to modify the current hwcap is not a good
>>>> approach. However the cost you are assuming here are *very* x86 biased,
>>>> where you have only on instruction (movl <variable>(%rip), %<destiny>) 
>>>> to load an external variable defined in a shared library, where for
>>>> powerpc it is more costly:
>>> 
>>> debian codesearch found 4 references to __builtin_cpu_supports
>>> all seem to avoid using it repeatedly.
>>> 
>>> multiversioning dispatch only happens at startup (for a small
>>> number of functions according to existing practice).
>>> 
>>> so why is hwcap expected to be used in hot loops?
>>> 
>> 
>> Good question, I do not know and I believe Steve could answer this
>> better than me.  I am only advocating here that assuming x86 costs
>> for powerpc is not the way to evaluate this patch.
> 
> Sorry but your details don't matter when underlying idea is just bad.
> Even if getting hwcap took 20 cycles otherwise it would still be bad
> idea. As you need to use hwcap only once at initialization bringing cost
> is completely irrelevant.
> 
> First as I explained major flaw of Steve approach how exactly do you
> ensure that gcc won't insert newer instruction that would lead to crash
> on older platform?
> 
> Second is that it makes no sense. If you are at situation where hwcap
> access gets noticable on profile a checking is also noticable on
> profile. So use ifunc which will save you that additional cycles on
> checking hwcap bits.
> 
> A programmer that uses hwcap in hot loop is just incompetent. Its stays
> constant on application. So he should make more copies of loop, each
> with appropriate options.
> 
> Then even if compiler still handled these issues correctly you will
> probaly lose more on missed compiler optimizations that your supposed
> gain. Compiler can select suboptimal patch as he doesn't want to expand
> function too much due size concerns.
> 
> That quite easy, for example in following would get magnitude slower
> with hwcap than ifuncs. Reason is that even gcc-5.1 doesn't split it
> into two branches each doing shift. Instead it emits div instruction
> which takes forever.
> 
> int hwcap;
> unsigned int foo(unsigned int i)
> {
>  int d = 8;
>  if (hwcap & 42)
>    d = 4;
>  return i / d;
> }
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]