[RFC] newlib/libc/include/langinfo.h: nl_langinfo enum off by one error causing pointer overwrite
Corinna Vinschen
vinschen@redhat.com
Thu Aug 22 19:13:02 GMT 2024
On Aug 22 11:14, Brian Inglis wrote:
> On 2024-08-22 02:07, Corinna Vinschen wrote:
> > Hi Brian,
> >
> > On Aug 21 17:58, Brian Inglis wrote:
> > > if __HAVE_LOCALE_INFO__ is defined, then _NL_MESSAGES_CODESET is defined
> > > instead of _NL_CTYPE_CODESET
> >
> > There is no _NL_CTYPE_CODESET, only CODESET, and the value of
> > _NL_MESSAGES_CODESET is what it is for backward compatibility.
> >
> > The values must not change.
>
> What about the pointer positions in the array?
>
> > > demonstration of pointer overwrite by nl_langinfo dump program generated
> > > with additional langinfo.h hack after fix:
> >
> > Can you please provide your STC?
>
> Attached, with log, assuming current langinfo.h.
> Also attached is my original program, for use with appropriately fixed
> and/or hacked up langinfo.h, with my nl_items label array adjusted to match.
>
> > > if __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is
> > > also not defined, so it is unclear if _NL_MESSAGES_CODESET should be
> > > defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor
> > > __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field
> > > depending on those definitions
> >
> > As I wrote above, the values must not change. And given _NL_CTYPE_CODESET
> > doesn't really exist (but CODESET does), I'm not sure I understand the
> > problem here. As on Linux we have a value _NL_CTYPE_CODESET_NAME which
> > is equivalent to CODESET.
>
> An instance of a CODESET item and pointer exists, before the wide messages
> fields, and not accounted for in the enum defined in langinfo.h, used to
> retrieve the values by nl_langinfo(3).
I don't understand this sentence. There is a _NL_MESSAGES_CODESET at this
point, and the value is *not* supposed to be a pointer into an array or
struct. Whatever is wrong, it's not the values defined in langinfo.h
> What is shown in the enum as the _NL_MESSAGE_CODESET item appears to be
> generated from the LC_CTYPE locale entry, and added to the buffer and
> pointer array after the _NL_CTYPE_MB_CUR_MAX pointer, before the
> _NL_CTYPE_OUTDIGIT?_MB item pointers.
>
> What is not shown in the enum is the CODESET item generated from the
> LC_MESSAGES locale entry, and added to the buffer and pointer array after
> the _NL_MONETARY_WNEGATIVE_SIGN pointer before the _NL_MESSAGES_WYESEXPR
> pointer.
>
This is a wrong assumption. Compare with newlib/libc/locale/nl_langinfo.c,
line 183.
> As a result, the underlying pointers in the pointer array are off by one in
> the wide messages fields, and the _NL_COLLATE_CODESET pointer clobbers what
> was stored in the _NL_MESSAGES_WNOSTR position in the pointer array.
Yes, youre' right, there's an off by one. But this can't be fixed by
chaning nl_item values.
The culprit is actually in nl_langinfo.c. The internal nl_ext array
erronously contains an entry for lc_monetary_T::codeset which is not
part of the _NL_LOCALE_EXTENDED entries. This moves the wide char
yes/no strings accidentally by one. The patch is simple:
diff --git a/newlib/libc/locale/nl_langinfo.c b/newlib/libc/locale/nl_langinfo.c
index c34a7d131376..4477d833bec1 100644
--- a/newlib/libc/locale/nl_langinfo.c
+++ b/newlib/libc/locale/nl_langinfo.c
@@ -160,7 +160,6 @@ static struct _nl_item_t
_NLITEM (monetary, wmon_thousands_sep),
_NLITEM (monetary, wpositive_sign),
_NLITEM (monetary, wnegative_sign),
- _NLITEM (messages, codeset),
_NLITEM (messages, wyesexpr),
_NLITEM (messages, wnoexpr),
_NLITEM (messages, wyesstr),
> If __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is
> also not defined, so it is unclear if _NL_MESSAGES_CODESET should be
> defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor
> __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field
> depending on those definitions?
No, see above. There is no definition of _NL_CTYPE_CODESET, only
_NL_CTYPE_CODESET_NAME == CODESET. _NL_MESSAGES_CODESET is exactly
where it has to be. If you start moving _NL_* values around, you *will*
break backward compatibility with existing executables.
However, there *is* another problem. _NL_MESSAGES_CODESET is defined if
__HAVE_LOCALE_INFO__ is defined. OTOH, the internal datatype struct
lc_messages_T (see libc/locale/setlocale.h) defines the codeset member
only if __HAVE_LOCALE_INFO_EXTENDED__ is defined. This is no problem at
all for Cygwin, which defines __HAVE_LOCALE_INFO_EXTENDED__ anyway, but
a target only defining __HAVE_LOCALE_INFO__ would probably see an
error during compilation because lc_messages_T::codeset will be
undefined.
This is... weird.
Does no other target except Cygwin define __HAVE_LOCALE_INFO__???
Anyway, I'll fix it.
Thanks,
Corinna
More information about the Newlib
mailing list