This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: optimized libm single precision routines: erfcf, erff,expf for x86_64.


Hello,

The updated attached version is significantly hand-tuned assembler code.
We are looking forward to accepting and releasing this change.

Performance of the benchmark we use (time):

Istanbul/Atom/Nehalem/AVX

GLIBC-master:       148         224         122         246
Attached version:    51.93      97.23      33.12      26.71


ChangeLog:

2012-02-22  Liubov Dmitrieva  <liubov.dmitrieva@gmail.com>

       * sysdeps/x86_64/fpu/e_expf.S: New file.



Thanks.

--
Liubov Dmitrieva
Intel Corporation

2012/2/17 H.J. Lu <hjl.tools@gmail.com>:
> On Thu, Feb 16, 2012 at 2:00 PM, Richard Henderson <rth@twiddle.net> wrote:
>> On 02/16/2012 12:11 PM, Dmitrieva Liubov wrote:
>>> + ? ? movss ? %xmm0, -16(%rsp) ? ? ? ?/* save SP x*K/log(2)+RS */
>>> + ? ? movss ? -16(%rsp), %xmm1 ? ? ? ?/* load SP x*K/log(2)+RS */
>>
>> What's up with these sorts of obvious compiler-generated bits of silliness?
>>
>> You stated that you do not plan to provide the C source because you "believe
>> that the assembly should be faster." ?Given turds like the above, I do not
>> accept this assertion without proof.
>>
>> Given this routine does all scalar code, I don't see why it might not be
>> faster for all of the other targets as well.
>>
>>
>
> These codes do look bad:
>
> + ? ? ? cvtsd2ss ? ? ? ?%xmm0, %xmm0 ? ?/* SP x*K/log(2)+RS */
> + ? ? ? movss ? %xmm0, -16(%rsp) ? ? ? ?/* save SP x*K/log(2)+RS */
> + ? ? ? movss ? -16(%rsp), %xmm1 ? ? ? ?/* load SP x*K/log(2)+RS */
>
> They can be replaced by
>
> cvtsd2ss ? ? ? ?%xmm0, %xmm1
>
> Also do we need to do it like:
>
> + ? ? ? movss ? %xmm0, -8(%rsp) ? ? ? ? /* Save argument in current frame */
>
> I think you simply remove it and do
>
> ? ? ? ?/* Here if 2^(-28)<=|x|<125*log(2) */
> ? ? ? ?cvtss2sd ? ? ? ?%xmm0, %xmm3 ? ?/* Load x converted to double precision */
>
> --
> H.J.

Attachment: e_expf.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]