Bug 1278 - regex undefined behavior with shifting past word length
Summary: regex undefined behavior with shifting past word length
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: regex (show other bugs)
Version: 2.3.5
: P2 normal
Target Milestone: ---
Assignee: GOTO Masanori
URL:
Keywords:
Depends on:
Blocks: 1302
  Show dependency treegraph
 
Reported: 2005-08-31 19:36 UTC by Paul Eggert
Modified: 2018-04-19 14:47 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
shift-related patches for regex (2.58 KB, patch)
2005-08-31 19:37 UTC, Paul Eggert
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Eggert 2005-08-31 19:36:46 UTC
The regex code sometimes shifts a word by a value greater than the word size,
which has undefined behavior.  While fixing this, I also fixed a
few other porting glitches that are related. I'll attach a patch.
Comment 1 Paul Eggert 2005-08-31 19:37:17 UTC
Created attachment 633 [details]
shift-related patches for regex
Comment 2 paolo.bonzini@lu.unisi.ch 2005-09-01 07:03:20 UTC
Subject: Re:  regex undefined behavior with shifting past
 word length

The last hunk is surely wrong.  I really meant ~0.

Paolo
Comment 3 Andreas Schwab 2005-09-01 10:00:11 UTC
-1 is better. 
Comment 4 Paul Eggert 2005-09-01 22:29:29 UTC
The last hunk is purely for ports to ones' complement and
signed-magnitude hosts.  It has no effect in the normal case.

For example, on a one's complement host, ~0 has the numeric value
zero, i.e., ~0 == 0.  Also, ~0 is of type int.  When ~0 is converted
to unsigned int, it is converted by value, not by bit-pattern.  (The C
Standard requires this.)  Hence ((unsigned) ~0) is equivalent to
((unsigned) 0), which in turn is equivalent to 0u, which is zero.

The same problem occurs with signed-magnitude hosts.  It also occurs
with unsigned short int (the type being used here).

Admittedly this is a minor point since such hosts are rare, but it's
easy to do portably so we might as well do it that way.
Comment 5 paolo.bonzini@lu.unisi.ch 2005-09-02 06:17:14 UTC
Subject: Re:  regex undefined behavior with shifting past
 word length


>For example, on a one's complement host, ~0 has the numeric value
>zero, i.e., ~0 == 0.  Also, ~0 is of type int.  When ~0 is converted
>to unsigned int, it is converted by value, not by bit-pattern.  (The C
>Standard requires this.)  Hence ((unsigned) ~0) is equivalent to
>((unsigned) 0), which in turn is equivalent to 0u, which is zero.
>  
>
So you want ~0u, but not -1.

Paolo
Comment 6 Andreas Schwab 2005-09-02 10:20:45 UTC
-1 when cast to unsigned is exactly the same as ~0u and also works with any 
other unsigned type regardless of its width, whereas ~0u doesn't. 
Comment 7 Paul Eggert 2005-09-02 23:17:21 UTC
Andreas is right.  For example, "unsigned long int x = ~0u;" will not
have an all-1s value on most 64-bit hosts.

In this particular hunk, ~0u would also work since the destination
type is unsigned short int.  So if you'd really rather use ~0u I
guess that would be OK.  However, as a style matter, it is confusing
to use ~0u in some unsigned contexts, while using -1 in other unsigned
contexts.  Since -1 always works, it's more consistent to use it in
all unsigned contexts.

For example, suppose someone later changes eps_reachable_subexps_map
from unsigned short int to unsigned long int, for performance reasons.
If the code used ~0u here, it would have to be changed to ~ (unsigned
long int) 0, and it's quite possible that people would forget to make
that change.  Whereas if we simply change it to -1 now, it will work
regardless of later changes like this.

I should mention that the situation is different in signed contexts.
In general one must use ~ (SIGNED_TYPE) 0 in that case to get an
all-1s pattern.  But signed bit-twiddling is trickier (since one must
in general worry about ~0 == 0 and overflow issues), and I'd rather
that the regex code stuck with unsigned unsigned bit-twiddling.
Comment 8 Ulrich Drepper 2005-09-06 23:29:57 UTC
It is ridicuous to care about 1-complement and "signed-magnitude" hosts.

I've applied most of the patch.