[PATCH] improve regex performance
Isamu Hasegawa
isamu@yamato.ibm.com
Wed Jun 27 06:11:00 GMT 2001
Hello,
Subject: Re: [PATCH] improve regex performance
From: Ulrich Drepper <drepper@redhat.com>
Date: 27 Jun 2001 00:13:47 -0700
Message-ID: < m37kxyfl90.fsf@otr.mynet >
> [snip]
> Only the one test commented out with DOES_NOT_WORK doesn't work.
> It also fails for the single-byte case which is worrisome.
I think it is a limitation of the regex library.
In this case, ".*" in the RE "G.*ran" can examine up to 20480 byte.
However in the input file ChangeLog.8, the fist 'G' is "GLIBC_2.1"
in line 29. So ".*" examine all of remains of ChangeLog.8, which
are about 400KB.
When I cut off the first 'G' and following 20480byte as the new
input file, the test case works.
And when I add any one character to it, the test case doesn't work.
On the other hand the regular expression "G.\\{0,20480\\}ran" works,
but the RE "G.\\{0,20481\\}ran" doesn't work.
(the upper bound "20480" may differ from architecture to architecture.)
In addition, we can alleviate this limitation by increasing the value
of re_max_failures, which is defined at line 1502 in regex.c(rev 1.100).
And we may have to increase the value of RE_DUP_MAX too.
However I think the best solution is re-writing regex library from
scratch.
> If you don't see any problems with your patch I'd like to check it in.
After re-building all of the glibc, I can't see the problem now.
(It seems that I just broke somewhere in my source tree, sorry...)
Then, could you check it in please?
ChangeLog entries are:
2001-06-20 Isamu Hasegawa <isamu@yamato.ibm.com>
* posix/regex.c(FREE_WCS_BUFFERS): New macro to free buffers.
(re_search_2): invoke convert_mbs_to_wcs and FREE_WCS_BUFFERS.
(wcs_re_match_2_internal): Check whether the wcs buffers need
seting up or not, and skip seting up routin if not needed.
2001-06-26 Isamu Hasegawa <isamu@yamato.ibm.com>
* posix/regex.c(count_mbs_length): Use binary search for
optimization.
In addition,
> needed is much better (only between 2000% and 8000% slower :-).
I take back that I said it is reasonable...
Meanwhile perhaps it might be just a bug of the wcs version,
I'll investigate a little more, and if I found any problem, I'll
report ASAP.
Thanks,
--
Isamu Hasegawa
IBM Japan, Ltd.
More information about the Libc-alpha
mailing list