This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] regexec: Fix off-by-one bug in weight comparison [BZ #23036]


On 07/09/2018 01:20 PM, Florian Weimer wrote:
> 2018-07-09  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #23036]
> 	* posix/regexec.c (check_node_accept_bytes): When comparing
> 	weights, do not compare an extra byte after the end of the
> 	weights.

I assume that this fixes the issue and that you have minimally passed
the regression testing for x86_64. I would like a test for this, but it
can wait. This regression is a blocker for the release, and I appreciate
your root cause analysis of the issue.

If I understand correctly this is a "off-by-one" error and I'll explain below.

In the future please try to keep the change to the absolute minimum instead
of rewriting it and hoisting constants out of loops. It certainly made the
review longer than it should have been. We can always clean this up later.

Given all of my assumptions and admonitions, this looks OK.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> diff --git a/posix/regexec.c b/posix/regexec.c
> index 63aef97535..73644c2341 100644
> --- a/posix/regexec.c
> +++ b/posix/regexec.c
> @@ -3878,30 +3878,27 @@ check_node_accept_bytes (const re_dfa_t *dfa, Idx node_idx,
>  	      indirect = (const int32_t *)
>  		_NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB);
>  	      int32_t idx = findidx (table, indirect, extra, &cp, elem_len);
> +	      int32_t rule = idx >> 24;
> +	      idx &= 0xffffff;

OK.

>  	      if (idx > 0)
> -		for (i = 0; i < cset->nequiv_classes; ++i)
> -		  {
> -		    int32_t equiv_class_idx = cset->equiv_classes[i];
> -		    size_t weight_len = weights[idx & 0xffffff];
> -		    if (weight_len == weights[equiv_class_idx & 0xffffff]
> -			&& (idx >> 24) == (equiv_class_idx >> 24))

Check length and rule is the same.

> -		      {
> -			Idx cnt = 0;
> -
> -			idx &= 0xffffff;
> -			equiv_class_idx &= 0xffffff;
> -
> -			while (cnt <= weight_len
> -			       && (weights[equiv_class_idx + 1 + cnt]
> -				   == weights[idx + 1 + cnt]))

Here we start at count 0 and go to <= weight_len.

This is one byte too far.

In an N-length weight:

|L01234...[N-1]|
 ^^        ^
 ||        |--- weights [idx + 1 + (weight_len - 1)]
 ||--- weights[idx + 1]
 |--- weights

L == N == weight_len

So the loop for cnt <= weight_len goes one byte beyond the weights array.

> -			  ++cnt;
> -			if (cnt > weight_len)
> -			  {
> -			    match_len = elem_len;
> -			    goto check_node_accept_bytes_match;
> -			  }
> -		      }
> -		  }
> +		{
> +		  size_t weight_len = weights[idx];
> +		  for (i = 0; i < cset->nequiv_classes; ++i)
> +		    {
> +		      int32_t equiv_class_idx = cset->equiv_classes[i];
> +		      int32_t equiv_class_rule = equiv_class_idx >> 24;
> +		      equiv_class_idx &= 0xffffff;
> +		      if (weights[equiv_class_idx] == weight_len

OK, check the lengths are the same.

> +			  && equiv_class_rule == rule

OK, check the rule is the same.

> +			  && memcmp (weights + idx + 1,
> +				     weights + equiv_class_idx + 1,
> +				     weight_len) == 0)

OK. Use memcmp to compare only weight_len bytes, not weight_len + 1 bytes.

> +			{
> +			  match_len = elem_len;
> +			  goto check_node_accept_bytes_match;


OK.

> +			}
> +		    }
> +		}
>  	    }
>  	}
>        else
> 

The new code is much more readable too, you can clearly see what is being
compared and why. It is better than what we had before.

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]