Bug 5945

Summary: regoff_t wrong has posix type
Product: glibc Reporter: vIk34 <vik>
Component: regexAssignee: Ulrich Drepper <drepper.fsp>
Status: SUSPENDED ---    
Severity: enhancement CC: bugdal, eggert, filbranden, fweimer, glibc-bugs-regex, glibc-bugs
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description vIk34 2008-03-16 14:01:07 UTC
The type regoff_t should hold at least the same amount of bytes as off_t and
ssize_t 
here: http://www.opengroup.org/onlinepubs/009695399/basedefs/regex.h.html
It's defined in regex.h as 'int` so it won't hold off_t on a 64bit machine or a
32bit machine where off_t 64bit support is enabled (#define _FILE_OFFSET_BITS 64)
Comment 1 Ulrich Drepper 2008-03-30 04:30:56 UTC
This is known but obviously cannot easily be fixed.  Suspended until somebody
takes this serious to actually take a stab at a solution.
Comment 2 Paolo Bonzini 2008-08-18 10:09:30 UTC
You mean, it cannot be easily fixed because it breaks the ABI?
Comment 3 Ulrich Drepper 2008-08-18 14:04:36 UTC
(In reply to comment #2)
> You mean, it cannot be easily fixed because it breaks the ABI?

Yes.
Comment 4 Paolo Bonzini 2010-09-09 15:44:41 UTC
On bug-gnulib, the following suggestion was made by Bruno Haible:

> [glibc could] offer some preprocessor macro that makes regoff_t 64-bit wide -
> like it was done for off_t.
> 
> Would glibc need to export additional symbols for this? Yes.
> 
> Would a compiled glibc need to contain two copies of the regex code? No, the
> 32-bit version could be a thin wrapper around the 64-bit version.

I guess this would count as "somebody takes this serious to actually take a stab
at a solution".  Would _REGEX_OFFSET_BITS be okay for you as a macro?
Comment 5 Paolo Bonzini 2011-06-17 10:41:59 UTC
*** Bug 12900 has been marked as a duplicate of this bug. ***
Comment 6 Paul Eggert 2013-02-08 01:51:43 UTC
The original bug report is old, and POSIX has changed in the meantime: regoff_t is now required to be at least as large as ptrdiff_t and ssize_t. (Previously this was off_t and ssize_t.)  See:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/regex.h.html
Comment 7 Rich Felker 2013-02-08 02:32:02 UTC
Yes, thanks for updating/clarifying that. Is there any chance of this ever getting fixed? I suspect there may even be obscure vulnerabilities related to this, if you can somehow pass a string longer than 4gb to regexec and cause the matches to get truncated, and thus for the caller to either dereference memory at a negative offset, exposing data it should not, or treating non-matching data early in the string as a match.

Obviously these could be closed by making the interface even more non-conforming and rejecting offsets that would overflow, but I think the proper solution is to add a versioned symbol and fix the type.