[PATCH] Speed-up character range regexes by up to 2x

bonzini paolo.bonzini@polimi.it
Mon Jan 12 13:07:00 GMT 2004


>>What follows the review of the "gawk guy"'s regex patch:
>>
>> > +#ifdef RE_ENABLE_I18N
>> >    int icase = (dfa->mb_cur_max == 1 && (bufp->syntax & RE_ICASE));
>> > +#else
>> > +  int icase = (bufp->syntax & RE_ICASE);
>> > +#endif
>>
>>This is unneeded.
>>    
>>
I mean that you probably added these fixes when MB_CUR_MAX was used 
because somebody got link errors for MB_CUR_MAX undefined; but now, 
dfa->mb_cur_max cannot possibly be undefined and will always be 1 if 
!RE_ENABLE_I18N.

>>> @@ -2558,8 +2564,8 @@
>>>                 ? __btowc (start_ch) : start_elem->opr.wch);
>>>      end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
>>>               ? __btowc (end_ch) : end_elem->opr.wch);
>>> -    cmp_buf[0] = start_wc;
>>> -    cmp_buf[4] = end_wc;
>>> +    cmp_buf[0] = start_wc != WEOF ? start_wc : start_ch;
>>> +    cmp_buf[4] = end_wc != WEOF ? end_wc : end_ch;
>>>      if (wcscoll (cmp_buf, cmp_buf + 4) > 0)
>>>        return REG_ERANGE;
>> 
>>I am not sure this is the fix; maybe it is better not to include the 
>>character set if start_wc == WEOF || end_wc == WEOF, or to return 
>>REG_ERANGE?
>>    
>>
>I got WEOF and things core dumped (or equivalent).
>  
>
Yes, it seems to me too that the case must be handled.

Paolo




More information about the Libc-alpha mailing list