The regex bitset code currently uses an array of unsigned int words to represent a bitset, and assumes these words are exactly 32 bits wide. On modern 64-bit hosts, using a 64-bit word would require approximately half the instructions for the bitset-related code. Also, it's better not to assume 32-bits everywhere. I'll attach a patch.
Created attachment 650 [details] bitset performance / portability improvements
Created attachment 651 [details] bitset performance / portability improvements (revised) Derek Price reported that the previous patch elicits a "suggest parentheses around + or - in operand of &" with gcc -Wall. The revised patch adds a pair of parentheses to avoid this warning.
I've checked in a significantly changed version of this patch.
The master branch has been updated by Paul Eggert <eggert@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2cc478ed1be82711a6cac15aae683530b2e6732b commit 2cc478ed1be82711a6cac15aae683530b2e6732b Author: Paul Eggert <eggert@cs.ucla.edu> Date: Tue Aug 4 23:45:27 2020 -0700 Copy regex_internal.h from Gnulib Sync this file from Gnulib, thus incorporating the following fix for a bug with regexps with 16 or more subexpressions: * posix/regex_internal.h (struct re_backref_cache_entry): Use bitset_word_t as the type of eps_reachable_subexps_map, instead of unsigned short int. This fixes a bug I introduced to glibc in 2005-09-28T17:33:18Z!drepper@redhat.com (glibc commit 2c05d33f90861d074dc12808dafbde30f487b1a0, BZ #1302). Remove unused member 'unused'.