This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/25] Remove nested functions: crypt/md5-crypt.c


On Tue, May 20, 2014 at 02:54:41PM +0000, Joseph S. Myers wrote:
> On Tue, 20 May 2014, Siddhesh Poyarekar wrote:
> 
> > On Tue, May 20, 2014 at 06:23:40PM +0400, Konstantin Serebryany wrote:
> > > Do you think such a benchmarks is possible at all?
> > > I've made a simple test (attached) and profiled it with "perf":
> > >     85.26%    a.out  libcrypt-2.15.so   [.] _ufc_doit_r
> > >      6.13%    a.out  libcrypt-2.15.so   [.] _ufc_mk_keytab_r
> > >      2.40%    a.out  libcrypt-2.15.so   [.] _ufc_setup_salt_r
> > >      1.56%    a.out  libcrypt-2.15.so   [.] __crypt_r
> > >      1.51%    a.out  libcrypt-2.15.so   [.] _ufc_output_conversion_r
> > >      1.35%    a.out  libcrypt-2.15.so   [.] crypt
> > > 
> > > As you can see, crypt_r itself takes a tiny fraction of time,
> > > most of it is spent in _ufc_* which are defined in another module.
> > > Any changes in crypt itself (unless you do something insane) will not
> > > be observable in profile.
> > 
> > You'll need to choose inputs so that __md5_crypt_r is called -
> > crypt/crypt-entry.c should help you with that.  if __md5_crypt_r takes
> > a tiny fraction of time again, then this change should be safe, but I
> > would prefer that the inputs get fed into the benchtests anyway so
> > that we can automatically track performance of crypt for those inputs
> > in future.
> 
> Another thing that can be done is to look at the code generated for the 
> relevant file before and after the patch - I'd expect very little change.
> 
Yes I wanted to write that benchmarking this change is probably
pointless as gcc should optimize them in same way (if there is
consistently difference in one way we could report that as gcc regression.)

Also perf data show that its a cold function so you should improve
readability not performance (measuring difference is possible but you
need to run that for few days and wait until standard deviation drops
sufficiently to get statistically significant result). 

About only way this could make difference is if when it gets inlined in
one way and does not in other.
That is out of scope of microbenchmarks, instruction cache misses easily
could make a function slower in wild but you would measure that inlined
function is faster when you would call it in tight loop.

As actual change is concerned 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]