This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Three regex speedups, one of which is actually a bugfix

From: "Paolo Bonzini" <bonzini at gnu dot org>
To: "Jakub Jelinek" <jakub at redhat dot com>
Cc: <libc-alpha at sources dot redhat dot com>
Date: Thu, 1 Jan 2004 21:58:22 +0100
Subject: Re: [PATCH] Three regex speedups, one of which is actually a bugfix
References: <003601c3d067$a12596f0$fcde1d97@philo> <20040101123157.GE2020@sunsite.ms.mff.cuni.cz>

> Why context : 10? 4 bits are enough IMHO.

I keep confusing contexts and constraints.  The latter are 10 bits wide.

> If fetching preg->newline_anchor contributes to the speedup, then that
> argument should be removed, not changed.

preg->newline_anchor is not needed unless the character is a newline.  Passing preg
instead of preg->newline-anchor saves a memory access for almost all calls to
re_string_context_at.  Fetching it from the re_string_t or from the re_regex_t does
not save anything -- except if you move word_char to the re_string_t as well, so that
you can remove the argument, but that's a somewhat complementary optimization that
can be made in a follow-up patch.

> It is not initialized always, because in the common case there is no
> \<, \>, \b, \B, \w and \W in regular expression and so differentiating
> between word and non-word characters is not needed at all.

But, every match goes down to build_tr_table, which calls IS_WORD_CHAR 256 times and
brings __ctype_b_loc high in the profile.  That's why it's better to always
initialize word_char.  If you use a cached bitset instead of calling isalnum, it
makes no difference if the cached bitset is correct (initialized with isalnum) or
all-zeros (unless you go down into branch prediction which is overkill, isn't it?).
Using a flag to avoid iswalnum calls in IS_WIDE_WORD_CHAR is again a complementary
optimization, which can be done with a separate patch.

Paolo

Follow-Ups:
- Re: [PATCH] Three regex speedups, one of which is actually a bugfix
  - From: Jakub Jelinek

References:
- [PATCH] Three regex speedups, one of which is actually a bugfix
  - From: Paolo Bonzini
- Re: [PATCH] Three regex speedups, one of which is actually a bugfix
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]