This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] avoid initialization overhead in strcoll


On Fri, Jun 05, 2015 at 12:08:37AM +0530, Siddhesh Poyarekar wrote:
> On Thu, May 28, 2015 at 05:39:01PM +0200, Leonhard Holz wrote:
> > This patch removes some initialization overhead in the hot path of strcoll. It
> > improves the file listing benchmark by 20% on x86.
> 
> You've removed a memset call which seems to be the reason for the 20%
> overhead; I can't see anything else that could have done this, but I
> haven't looked hard enough.
> 
> This change however should be split up to only include the useful
> stuff.
> 
> > diff --git a/string/strcoll_l.c b/string/strcoll_l.c
> > index 0fa005f..14bfb8b 100644
> > --- a/string/strcoll_l.c
> > +++ b/string/strcoll_l.c
> > @@ -62,7 +62,6 @@ typedef struct
> >    int len;			/* Length of the current sequence.  */
> >    size_t val;			/* Position of the sequence relative to the
> >  				   previous non-ignored sequence.  */
> > -  size_t idxnow;		/* Current index in sequences.  */
> 
> This should go in as a separate change since it is valid without any
> performance justification.  idxnow is unused.  Please post this as a
> separate patch.
> 
> >    size_t idxmax;		/* Maximum index in sequences.  */
> >    size_t idxcnt;		/* Current count of indices.  */
> >    size_t backw;			/* Current Backward sequence index.  */
> > @@ -313,20 +312,22 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2,
> > __locale_t l)
> >    assert (((uintptr_t) extra) % __alignof__ (extra[0]) == 0);
> >    assert (((uintptr_t) indirect) % __alignof__ (indirect[0]) == 0);
> > 
> > -  int result = 0, rule = 0;
> > -
> > +  int result = 0;
> >    coll_seq seq1, seq2;
> > -  memset (&seq1, 0, sizeof (seq1));
> > -  seq2 = seq1;
> > +  seq1.len = 0;
> > +  seq1.idxmax = 0;
> > +  seq1.rule = 0;
> > +  seq2.len = 0;
> > +  seq2.idxmax = 0;
> 
> I'm surprised that this actually makes a difference.  The memset
> should have been optimized out and replaced with movs.  The change to
> remove the RULE variable seems unrelated to me and should just be
> avoided if it does not result in any interesting code change..
> 
> Can you please post a more detailed justification of what caused the
> performance improvement?
> 
Main difference from assembly output is saving 64 bytes on stack space,
otherwise looks like random renaming of jumps.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]