Bug 11828 - Please provide supported equivalents of _NL_*
Summary: Please provide supported equivalents of _NL_*
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: unspecified
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-22 00:06 UTC by Samuel Thibault
Modified: 2015-08-27 22:05 UTC (History)
3 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gun
Build: x86_64-pc-linux-gnu
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Samuel Thibault 2010-07-22 00:06:21 UTC
Hello,

I am calling nl_langinfo (_NL_TIME_WEEK_1STDAY); which is supposed to
return an integer like 19971130, but I am getting 0x888888880130bc3a,
i.e. (0x8888888800000000 | 19971130). Looking at __nl_langinfo_l code,
I can see:

return (char *) data->values[index].string;

where string is member of 

  union locale_data_value
  {
    const uint32_t *wstr;
    const char *string;
    unsigned int word;          /* Note endian issues vs 64-bit pointers.  */
  }

and indeed I can read

locale/categories.def:  DEFINE_ELEMENT (_NL_TIME_WEEK_1STDAY,     "week-1stday",         std, word)

I guess maybe the union gets loaded through the word member only, thus
leaving the higher part of the string member uninitialized?  Note that
I am using MALLOC_PERTURB_=$RANDOM, without it the problem disappears.
Comment 1 Roland McGrath 2010-07-22 00:19:56 UTC
Indeed, the other word of the union is uninitialized.  It's easy to make sure
it's zero.  But I wonder about that usage mode.  Does it work correctly (without
provoking this bug) on big-endian machines?
Comment 2 Samuel Thibault 2010-07-22 00:37:38 UTC
It completely fails on sparc64 indeed: it returns 0x130bc3a00000000
(and 0x130bc3a2d2d2d2d with MALLOC_PERTURB_=1234)
Comment 3 Jakub Jelinek 2010-07-22 05:21:17 UTC
You haven't provided the testcase, but from what you say I'd say it is a user
error.  nl_langinfo returns the pointer from the union, so if you need the word
instead, you need to:
union { char *str; unsigned int word; } u;
u.str = nl_langinfo (...);
xxx = u.word;
Comment 4 Samuel Thibault 2010-07-22 08:35:01 UTC
Where is that documented? 
Comment 5 Andreas Schwab 2010-07-22 08:46:34 UTC
nl_langinfo is only documented to be able to return string properties of the
locale.
Comment 6 Samuel Thibault 2010-07-22 08:54:18 UTC
I know that, that's why I'm asking for documentation, as since it's currently documented that way, I had assumed that char* had to be casted to scalar. It's only now that I have actually read the source code that I know it's not done that way. It really needs documentation and/or fix.
Comment 7 Jakub Jelinek 2010-07-22 09:00:05 UTC
Why?
nl_langinfo is only documented for a couple of values, see
http://www.opengroup.org/onlinepubs/9699919799/basedefs/langinfo.h.html
The rest is undocumented, so if you call nl_langinfo with such arguments, it is
implementation defined behavior.
A quick google query would tell you what you need to do...
Comment 8 Samuel Thibault 2010-07-22 09:14:08 UTC
"implementation-defined" doesn't mean that the implementation doesn't have do document it. On the contrary, I'd say. That's why I'm again asking for documentation in e.g. the glibc info.

Now, as you say, quick google query. That gives me
mail.gnome.org/archives/hildon-list/2008-August/msg00000.html

langinfo = nl_langinfo(_NL_TIME_WEEK_1STDAY);
week_origin = GPOINTER_TO_INT(langinfo);

as well as http://www.mail-archive.com/rrd-developers@lists.oetiker.ch/msg03613.html

long week_1stday_l = (long) nl_langinfo (_NL_TIME_WEEK_1STDAY);

Eventually I got to http://sourceware.org/bugzilla/show_bug.cgi?id=5486 which boils down to exactly the same thing I'm asking now: please document this glibc-only behavior.
Comment 9 Samuel Thibault 2010-07-22 09:15:19 UTC
Also, it looks very odd that the user has to define the union himself. Shouldn't that be defined in langinfo.h? (I'd consider that that alone would be enough for some minimal documentation).

Comment 10 Andreas Schwab 2010-07-22 12:11:03 UTC
The names starting with _NL are all internal to glibc.  All supported properties
are strings.
Comment 11 Samuel Thibault 2010-07-22 12:16:37 UTC
That should be documented then.  And I'd thus turn this bug into "please provide supported *WEEK* langinfo items", as all calendar applications need these.
Comment 12 Andreas Schwab 2010-07-22 12:21:02 UTC
ISO/IEC 9899:1999

7.1.3 Reserved identifiers

- All identifiers that begin with an underscore and either an uppercase letter or
another underscore are always reserved for any use.
Comment 13 Samuel Thibault 2010-07-22 12:39:29 UTC
Mmm, but there are at least _GNU_SOURCE, _IONBF & such which fall in that area, as well as stdio_ext.h functions...

Well, anyway, let's turn the bug into "please support these!":

There are a lot of very useful locale information that langinfo could provide (paper dimension, calendar layout), but discussion within this bug says that they are not supported. Actually, a lot of them are not even used inside glibc (e.g. _NL_PAPER_HEIGHT), so I'm wondering why they are here at all if one can not assume that they are supported.

Please provide supported equivalents to these _NL_* langinfo items so applications can use them.