[RFC] newlib/libc/include/langinfo.h: nl_langinfo enum off by one error causing pointer overwrite

Corinna Vinschen vinschen@redhat.com
Thu Aug 22 19:13:02 GMT 2024


On Aug 22 11:14, Brian Inglis wrote:
> On 2024-08-22 02:07, Corinna Vinschen wrote:
> > Hi Brian,
> > 
> > On Aug 21 17:58, Brian Inglis wrote:
> > > if __HAVE_LOCALE_INFO__ is defined, then _NL_MESSAGES_CODESET is defined
> > > instead of _NL_CTYPE_CODESET
> > 
> > There is no _NL_CTYPE_CODESET, only CODESET, and the value of
> > _NL_MESSAGES_CODESET is what it is for backward compatibility.
> > 
> > The values must not change.
> 
> What about the pointer positions in the array?
> 
> > > demonstration of pointer overwrite by nl_langinfo dump program generated
> > > with additional langinfo.h hack after fix:
> > 
> > Can you please provide your STC?
> 
> Attached, with log, assuming current langinfo.h.
> Also attached is my original program, for use with appropriately fixed
> and/or hacked up langinfo.h, with my nl_items label array adjusted to match.
> 
> > > if __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is
> > > also not defined, so it is unclear if _NL_MESSAGES_CODESET should be
> > > defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor
> > > __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field
> > > depending on those definitions
> > 
> > As I wrote above, the values must not change.  And given _NL_CTYPE_CODESET
> > doesn't really exist (but CODESET does), I'm not sure I understand the
> > problem here.  As on Linux we have a value _NL_CTYPE_CODESET_NAME which
> > is equivalent to CODESET.
> 
> An instance of a CODESET item and pointer exists, before the wide messages
> fields, and not accounted for in the enum defined in langinfo.h, used to
> retrieve the values by nl_langinfo(3).

I don't understand this sentence.  There is a _NL_MESSAGES_CODESET at this
point, and the value is *not* supposed to be a pointer into an array or
struct.  Whatever is wrong, it's not the values defined in langinfo.h

> What is shown in the enum as the _NL_MESSAGE_CODESET item appears to be
> generated from the LC_CTYPE locale entry, and added to the buffer and
> pointer array after the _NL_CTYPE_MB_CUR_MAX pointer, before the
> _NL_CTYPE_OUTDIGIT?_MB item pointers.
> 
> What is not shown in the enum is the CODESET item generated from the
> LC_MESSAGES locale entry, and added to the buffer and pointer array after
> the _NL_MONETARY_WNEGATIVE_SIGN pointer before the _NL_MESSAGES_WYESEXPR
> pointer.
> 

This is a wrong assumption.  Compare with newlib/libc/locale/nl_langinfo.c,
line 183.

> As a result, the underlying pointers in the pointer array are off by one in
> the wide messages fields, and the _NL_COLLATE_CODESET pointer clobbers what
> was stored in the _NL_MESSAGES_WNOSTR position in the pointer array.

Yes, youre' right, there's an off by one.  But this can't be fixed by
chaning nl_item values.

The culprit is actually in nl_langinfo.c.  The internal nl_ext array
erronously contains an entry for lc_monetary_T::codeset which is not
part of the _NL_LOCALE_EXTENDED entries.  This moves the wide char
yes/no strings accidentally by one.  The patch is simple:

diff --git a/newlib/libc/locale/nl_langinfo.c b/newlib/libc/locale/nl_langinfo.c
index c34a7d131376..4477d833bec1 100644
--- a/newlib/libc/locale/nl_langinfo.c
+++ b/newlib/libc/locale/nl_langinfo.c
@@ -160,7 +160,6 @@ static struct _nl_item_t
   _NLITEM (monetary, wmon_thousands_sep),
   _NLITEM (monetary, wpositive_sign),
   _NLITEM (monetary, wnegative_sign),
-  _NLITEM (messages, codeset),
   _NLITEM (messages, wyesexpr),
   _NLITEM (messages, wnoexpr),
   _NLITEM (messages, wyesstr),

> If __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is
> also not defined, so it is unclear if _NL_MESSAGES_CODESET should be
> defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor
> __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field
> depending on those definitions?

No, see above.  There is no definition of _NL_CTYPE_CODESET, only
_NL_CTYPE_CODESET_NAME == CODESET.  _NL_MESSAGES_CODESET is exactly
where it has to be. If you start moving _NL_* values around, you *will*
break backward compatibility with existing executables.

However, there *is* another problem.  _NL_MESSAGES_CODESET is defined if
__HAVE_LOCALE_INFO__ is defined.  OTOH, the internal datatype struct
lc_messages_T (see libc/locale/setlocale.h) defines the codeset member
only if __HAVE_LOCALE_INFO_EXTENDED__ is defined.  This is no problem at
all for Cygwin, which defines __HAVE_LOCALE_INFO_EXTENDED__ anyway, but
a target only defining __HAVE_LOCALE_INFO__ would probably see an
error during compilation because lc_messages_T::codeset will be
undefined.

This is... weird.

Does no other target except Cygwin define __HAVE_LOCALE_INFO__???

Anyway, I'll fix it.


Thanks,
Corinna



More information about the Newlib mailing list