Bug 1302 - improve regex bitset performance (don't assume 32 bits)
Summary: improve regex bitset performance (don't assume 32 bits)
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: regex (show other bugs)
Version: 2.3.5
: P2 normal
Target Milestone: ---
Assignee: GOTO Masanori
URL:
Keywords:
Depends on: 1278 1285
Blocks:
  Show dependency treegraph
 
Reported: 2005-09-06 07:32 UTC by Paul Eggert
Modified: 2020-08-05 06:52 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
bitset performance / portability improvements (6.56 KB, patch)
2005-09-06 07:33 UTC, Paul Eggert
Details | Diff
bitset performance / portability improvements (revised) (6.58 KB, patch)
2005-09-06 17:48 UTC, Paul Eggert
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Eggert 2005-09-06 07:32:20 UTC
The regex bitset code currently uses an array of unsigned int words to
represent a bitset, and assumes these words are exactly 32 bits wide.
On modern 64-bit hosts, using a 64-bit word would require
approximately half the instructions for the bitset-related code.
Also, it's better not to assume 32-bits everywhere.  I'll attach a
patch.
Comment 1 Paul Eggert 2005-09-06 07:33:10 UTC
Created attachment 650 [details]
bitset performance / portability improvements
Comment 2 Paul Eggert 2005-09-06 17:48:51 UTC
Created attachment 651 [details]
bitset performance / portability improvements (revised)

Derek Price reported that the previous patch
elicits a "suggest parentheses around + or - in
operand of &" with gcc -Wall.  The revised patch
adds a pair of parentheses to avoid this warning.
Comment 3 Ulrich Drepper 2005-09-28 17:33:13 UTC
I've checked in a significantly changed version of this patch.
Comment 4 Sourceware Commits 2020-08-05 06:52:05 UTC
The master branch has been updated by Paul Eggert <eggert@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2cc478ed1be82711a6cac15aae683530b2e6732b

commit 2cc478ed1be82711a6cac15aae683530b2e6732b
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Tue Aug 4 23:45:27 2020 -0700

    Copy regex_internal.h from Gnulib
    
    Sync this file from Gnulib, thus incorporating the following
    fix for a bug with regexps with 16 or more subexpressions:
    * posix/regex_internal.h (struct re_backref_cache_entry):
    Use bitset_word_t as the type of eps_reachable_subexps_map,
    instead of unsigned short int.  This fixes a bug I introduced
    to glibc in 2005-09-28T17:33:18Z!drepper@redhat.com (glibc commit
    2c05d33f90861d074dc12808dafbde30f487b1a0, BZ #1302).
    Remove unused member 'unused'.