This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] regexec: Fix off-by-one bug in weight comparison [BZ #23036]


* Carlos O'Donell:

>> -			while (cnt <= weight_len
>> -			       && (weights[equiv_class_idx + 1 + cnt]
>> -				   == weights[idx + 1 + cnt]))
>
> Here we start at count 0 and go to <= weight_len.
>
> This is one byte too far.
>
> In an N-length weight:
>
> |L01234...[N-1]|
>  ^^        ^
>  ||        |--- weights [idx + 1 + (weight_len - 1)]
>  ||--- weights[idx + 1]
>  |--- weights
>
> L == N == weight_len
>
> So the loop for cnt <= weight_len goes one byte beyond the weights array.

Right.  I wasn't able to derive the data layout from
locale/programs/ld-collate.c, but there's this code in
string/strxfrm_l.c:

/* Find next weight and rule index.  Inlined since called for every char.  */
static __always_inline size_t
find_idx (const USTRING_TYPE **us, int32_t *weight_idx,
	  unsigned char *rule_idx, const locale_data_t *l_data, const int pass)
{
  int32_t tmp = findidx (l_data->table, l_data->indirect, l_data->extra, us,
			 -1);
  *rule_idx = tmp >> 24;
  int32_t idx = tmp & 0xffffff;
  size_t len = l_data->weights[idx++];

  /* Skip over indices of previous levels.  */
  for (int i = 0; i < pass; i++)
    {
      idx += len;
      len = l_data->weights[idx++];
    }

  *weight_idx = idx;
  return len;
}

This makes it abundantly clear that the length element does not count
itself in the length.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]