This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Another tweak to the multiplication algorithm

From: OndÅej BÃlka <neleai at seznam dot cz>
To: "Joseph S. Myers" <joseph at codesourcery dot com>
Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, libc-alpha at sourceware dot org
Date: Thu, 14 Feb 2013 22:18:38 +0100
Subject: Re: [PATCH] Another tweak to the multiplication algorithm
References: <20130213142137.GU15748@spoyarek.pnq.redhat.com><Pine.LNX.4.64.1302131753180.13055@digraph.polyomino.org.uk><20130214034947.GY15748@spoyarek.pnq.redhat.com><Pine.LNX.4.64.1302141312390.18169@digraph.polyomino.org.uk><20130214135654.GH11884@spoyarek.pnq.redhat.com><Pine.LNX.4.64.1302141803210.21966@digraph.polyomino.org.uk>

On Thu, Feb 14, 2013 at 06:11:22PM +0000, Joseph S. Myers wrote:
> On Thu, 14 Feb 2013, Siddhesh Poyarekar wrote:
> 
> > No, your analysis is spot on w.r.t. the number of operations being
> > larger; 1.5x in fact, so I don't have any analysis that says
> > otherwise.  The reason this works though, is that most numbers falling
> > into the slow path will have a large enough number of digits.  If
> > there is a computation that requires less number of digits (say, < 4
> > in mp_no) then it should never get into the mpa code since the fast
> > paths should take care of them.
> 
> The reference I have to hand (Brent and Zimmermann, Modern Computer 
> Arithmetic) suggests that for multiplication below the Karatsuba range you 
> should use a simple algorithm in which the fundamental operation to 
> optimize is multiplying an array of words by another word and accumulating 
> the result in another array of words - not anything involving more 
> complicated tricks as in your patch.  In GMP's lowlevel interface that 
> fundamental operation is mpn_addmul_1.
> 
A choice of range depends how good you optimize smaller multiplications.

Optimizing small no is quite tricky if you want effectively use avx. 
You for fixed size expand loop and with shuffles try to rearange multiplicantions (Possibly using karatsuba)
 it in no logical order to maximize troughput.

A of generic karatsuba depends on how good was optimization above.

> Unless you have evidence from other implementations of multiple-precision 
> arithmetic that the tricks in your patch form a useful part of optimal 
> implementations in some range, I'd suggest looking elsewhere for causes of 
> slowness in the mpa code (and maybe even considering reimplementing it in 
> terms of the __mpn_* functions from GMP that are included in libc, 
> although any partial conversion to those functions might be inefficient 
> because of converting between different representations of 
> multiple-precision numbers - the mpa one, and the GMP array of limbs).
>

References:
- [PATCH] Another tweak to the multiplication algorithm
  - From: Siddhesh Poyarekar
- Re: [PATCH] Another tweak to the multiplication algorithm
  - From: Joseph S. Myers
- Re: [PATCH] Another tweak to the multiplication algorithm
  - From: Siddhesh Poyarekar
- Re: [PATCH] Another tweak to the multiplication algorithm
  - From: Joseph S. Myers
- Re: [PATCH] Another tweak to the multiplication algorithm
  - From: Siddhesh Poyarekar
- Re: [PATCH] Another tweak to the multiplication algorithm
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]