This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.
Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi! There is at least one more use of unitialized data, which may even crash: tip_context handling. Can be seen e.g. on Daniel's testcase: #include <sys/types.h> #include <regex.h> int main() { regex_t reg; regmatch_t pm[1]; regcomp (®, "man", REG_ICASE); return regexec (®, "pipenightdreams", 1, pm, 0); } Here, re_search_internal calls re_string_allocate with len = 15 and init_len = 5. Then the loop in it (doesn't matter if without my today's patch or with it) skips everything until "ms" at the end, thus match_first is 13 and re_string_reconstruct is called on it. re_string_reconstruct calls: pstr->tip_context = re_string_context_at (pstr, offset - 1, eflags, newline); but mbs[12] is well beyond pstr->valid_len, it is well beyond pstr->buf_len even, so if unlucky could as well crash, certainly tip_context will be set incorrectly. This works only in regexec style searching (ie. start 0, range positive) and matching if MBS ICASE or MBS translate and input_len for pstr is bigger than MBS_CUR_MAX, or if mbs points into raw_mbs (ie. non-MBS no-ICASE no translate). Backward searching or increasing offset by more than buf_len is broken. For backwards searching, I'm afraid we need to check last MB_CUR_MAX chars before raw_mbs + raw_mbs_idx and see what the last multibyte char is. For UTF-8 this is trivial, just search backwards for first byte with top bit clear, but for other charsets it may be more difficult. Another thing I'm not sure is re_string_context_at implementation if MBS: Assuming all supported MBS locales have newline single byte '\n', there is IMHO problem with #define IS_WORD_CHAR(ch) (isalnum (ch) || (ch) == '_') c = re_string_byte_at (input, idx); if (IS_WORD_CHAR (c)) return CONTEXT_WORD; Shouldn't this use re_string_wchar_at and iswalnum for MBS locales? Jakub
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |