This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

re_string bugs


Hi!

There is at least one more use of unitialized data, which may even crash:
tip_context handling.
Can be seen e.g. on Daniel's testcase:

#include <sys/types.h>
#include <regex.h>

int main()
{
  regex_t reg;
  regmatch_t pm[1];
  regcomp (&reg, "man", REG_ICASE);
  return regexec (&reg, "pipenightdreams", 1, pm, 0);
}

Here, re_search_internal calls re_string_allocate with len = 15 and
init_len = 5.
Then the loop in it (doesn't matter if without my today's patch or with it)
skips everything until "ms" at the end, thus match_first is 13 and
re_string_reconstruct is called on it.
re_string_reconstruct calls:
      pstr->tip_context = re_string_context_at (pstr, offset - 1, eflags,
                                                newline);
but mbs[12] is well beyond pstr->valid_len, it is well beyond pstr->buf_len
even, so if unlucky could as well crash, certainly tip_context will be set
incorrectly.
This works only in regexec style searching (ie. start 0, range positive)
and matching if MBS ICASE or MBS translate and input_len for pstr is
bigger than MBS_CUR_MAX, or if mbs points into raw_mbs (ie. non-MBS
no-ICASE no translate).
Backward searching or increasing offset by more than buf_len is broken.
For backwards searching, I'm afraid we need to check last MB_CUR_MAX
chars before raw_mbs + raw_mbs_idx and see what the last multibyte char is.
For UTF-8 this is trivial, just search backwards for first byte with top bit
clear, but for other charsets it may be more difficult.

Another thing I'm not sure is re_string_context_at implementation if MBS:
Assuming all supported MBS locales have newline single byte '\n',
there is IMHO problem with
#define IS_WORD_CHAR(ch) (isalnum (ch) || (ch) == '_')
  c = re_string_byte_at (input, idx);
  if (IS_WORD_CHAR (c))
    return CONTEXT_WORD;
Shouldn't this use re_string_wchar_at and iswalnum for MBS locales?

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]