This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] regexec: Fix off-by-one bug in weight comparison [BZ #23036]
* Carlos O'Donell:
>> - while (cnt <= weight_len
>> - && (weights[equiv_class_idx + 1 + cnt]
>> - == weights[idx + 1 + cnt]))
>
> Here we start at count 0 and go to <= weight_len.
>
> This is one byte too far.
>
> In an N-length weight:
>
> |L01234...[N-1]|
> ^^ ^
> || |--- weights [idx + 1 + (weight_len - 1)]
> ||--- weights[idx + 1]
> |--- weights
>
> L == N == weight_len
>
> So the loop for cnt <= weight_len goes one byte beyond the weights array.
Right. I wasn't able to derive the data layout from
locale/programs/ld-collate.c, but there's this code in
string/strxfrm_l.c:
/* Find next weight and rule index. Inlined since called for every char. */
static __always_inline size_t
find_idx (const USTRING_TYPE **us, int32_t *weight_idx,
unsigned char *rule_idx, const locale_data_t *l_data, const int pass)
{
int32_t tmp = findidx (l_data->table, l_data->indirect, l_data->extra, us,
-1);
*rule_idx = tmp >> 24;
int32_t idx = tmp & 0xffffff;
size_t len = l_data->weights[idx++];
/* Skip over indices of previous levels. */
for (int i = 0; i < pass; i++)
{
idx += len;
len = l_data->weights[idx++];
}
*weight_idx = idx;
return len;
}
This makes it abundantly clear that the length element does not count
itself in the length.