This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [PATCH] improve regex performance


Isamu Hasegawa <isamu@yamato.ibm.com> writes:

> I think it is a limitation of the regex library.
> 
> In this case, ".*" in the RE "G.*ran" can examine up to 20480 byte.
> However in the input file ChangeLog.8, the fist 'G' is "GLIBC_2.1"
> in line 29.  So ".*" examine all of remains of ChangeLog.8, which
> are about 400KB.

Yes, I understand this.  But the problem is not this limit, it's how
regex is recovering.  Means, instead of ignoring the one possible
match it's not just ignoring it.  Regex should just skip this one
possible input character and continue looking so that the interesting
matches are found.

Beside, in the test case I don't have RE_DOT_NEWLINE set.  This means
that regex should stop looking for a match as soon as it sees a
newline.  This definitely eliminate the whole problem.

> Then, could you check it in please?

Of course, I just did.  Thanks again.

> I take back that I said it is reasonable...
> Meanwhile perhaps it might be just a bug of the wcs version,
> I'll investigate a little more, and if I found any problem, I'll
> report ASAP.

Well, I expect some overhead.  There is significant work to be done
and your changes already made a giant impact.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]