This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: question regarding div / std::div implementation
- From: Daniel Gutson <daniel dot gutson at tallertechnologies dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Wed, 20 Apr 2016 18:15:50 -0300
- Subject: Re: question regarding div / std::div implementation
- Authentication-results: sourceware.org; auth=none
- References: <CAF5HaEXKZ7j-gbZPiWPhDpx7=R0zm1xYvXNYCNUMG4WeZS532Q at mail dot gmail dot com> <5717DF65 dot 5060606 at linaro dot org> <CAF5HaEWdpAGiXtCO36u3F0QGAXfVHL+qkY+RLsszpv7paPVdMg at mail dot gmail dot com> <5717E68D dot 2020905 at linaro dot org> <CAF5HaEWuSS7dEEM6ogU6TCqQ-Y7OM9DC1HFCeuuAGL0vKtL2Kw at mail dot gmail dot com> <5717EB44 dot 5020508 at linaro dot org>
On Wed, Apr 20, 2016 at 5:49 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
> On 20-04-2016 17:36, Daniel Gutson wrote:
>> On Wed, Apr 20, 2016 at 5:29 PM, Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>> On 20-04-2016 17:07, Daniel Gutson wrote:
>>>> On Wed, Apr 20, 2016 at 4:58 PM, Adhemerval Zanella
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>>>
>>>>> On 20-04-2016 16:44, Daniel Gutson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> is there any reason that std::div / cstdlib div is not implemented
>>>>>> in such a way that it is expanded to
>>>>>> the assembly instruction -when available- that calculates both the
>>>>>> remainder and the quotient,
>>>>>> e.g. x86' div ?
>>>>>>
>>>>>> For example, why not an inline function with inline assembly? Or,
>>>>>> should this require a gcc built-in?
>>>>>
>>>>> I believe because nobody really implemented this optimization and
>>>>> my felling is if this is being a hotspot in your application you
>>>>> will probably get more gains trying to rewrite it than using the
>>>>> libc call.
>>>>
>>>> then it won't be portable, or optimally-portable, meaning that the optimization
>>>> would show up whenever my target supports it. Suppose I need to provide
>>>> my application for several architectures, I would expect that I should
>>>> be able to
>>>> write my application using standard functions, and that it will get
>>>> optimized for each platform.
>>>>
>>>> I'm reporting it in bugzilla and asking to assign it to one of my team members.
>>
>> FWIW, https://sourceware.org/bugzilla/show_bug.cgi?id=19974
>>
>>>
>>> I do not really get what exactly you are referring as non-portable,
>>> since glibc div code is implemented as stdlib/div.c and these will
>>> generate idivl instruction on x86_64 for all supported chips. And
>>
>> I don't see it generating the idivl instruction, but
>> callq 400430 <div@plt>
>> so I think it should be implemented as an inline function maybe with
>> inline assembly
>> (or rely on the pattern recognition as you suggest below).
>
> Off course it will generate a libcall, since stdlib.h header defines
> it an external call and compiler does not have any information on
> how to lower this.
>
>>
>>> afaik these are true for all supported architectures (I am not
>>> aware of any architecture that added a more optimized
>>> division/modulus operation with a *different* opcode).
>>
>> Could you please post an example and the gcc command line call where
>> you do get the idiv?
>
> I mean when building stdlib/div.c itself.
>
>>
>>>
>>> I mean to use the integer operation directly instead of using the
>>> libcall. The code is quite simple:
>>>
>>> div_t
>>> div (int numer, int denom)
>>> {
>>> div_t result;
>>>
>>> result.quot = numer / denom;
>>> result.rem = numer % denom;
>>>
>>> return result;
>>> }
>>
>>>
>>> You can try to add an inline version on headers, as such the one
>>> for string.h, but I would strongly recommend you to either work on
>>> your application if these are the hotspot (either by calling the
>>> operations directly instead) or on compiler side to make it
>>> handling it as builtin (and thus avoid the libcall).
>>
>> Why should this be a builtin? I can implement it on gcc, but I still
>> don't see why should I pass the burden to the compiler
>> whereas it is a matter of library implementation.
>
> Because carrying such implementation adds header complexity and burden
> maintainability, just check the string{2}.h header cleanup Wilco is
> pushing.
>
> IMHO I do not see a compelling reason to add the usage of inline
> assembly for such operation and I would avoid add a inline operation
> just to remove the libcall.
OK with no inline asm, but a libcall might be expensive specially in a
tight loop and messes with predictions;
a builtin is nonportable as well.
--
Daniel F. Gutson
Engineering Manager
San Lorenzo 47, 3rd Floor, Office 5
CÃrdoba, Argentina
Phone: +54 351 4217888 / +54 351 4218211
Skype: dgutson
LinkedIn: http://ar.linkedin.com/in/danielgutson