This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Should glibc provide a builtin C.UTF-8 locale?


On Thu, Oct 29, 2015 at 07:20:48PM +0100, Mike FABIAN wrote:
> Rich Felker <dalias@libc.org> wrote:
> 
> >> LC_CTYPE
> >>    almost the same
> >>    - C.UTF-8 just copies the LC_CTYPE from "i18n" (Which is kept
> >>      in sync with the latest Unicode release using some scripts) and
> >>      adds "translit_combining".
> >
> > So C.UTF-8 will have the full character-class data? I'm in favor of
> > that but just want to clarify, since omitting it would also be
> > possible.
> 
> Yes, with the patch I made, it has the full character-class data,
> i.e. exactly the same as in the i18n file.

Sounds good.

> >> LC_MONETARY
> >>    - C.UTF-8 tries to agree with C/POSIX as much as possible
> >>      and thus uses "USD" for int_curr_symbol, "$" for currency_symbol,
> >>      and "." for mon_decimal_point.
> >
> > This is incorrect, at least based on the spec. C requires the values
> > for int_curr_symbol and currency_symbol to be "" in the C locale (7.11
> > Localization <locale.h>, paragraph 2). I think the values you cited
> > are from en_US.
> 
> I wanted to fill in something for int_curr_symbol and currency_symbol
> mainly because "localedef" complains when these fields are empty
> and refuses to generate the binary locale unless one uses the force
> option:
> 
>      -c, --force
>           Write the output files  even if warnings were generated
>           about the input file.
> 
> and this might make one miss real errors.
> 
> Maybe "localedef" should be adapted to allow empty values
> for these two fields if the locale to be generated is C.UTF-8?

Yes, I think so. Putting en_US values in there is inappropriate and
makes this locale not much of a C.UTF-8 locale but just a
slightly-different variant of en_US.

> >> LC_MESSAGES
> >>    - C.UTF-8 uses the same as C/POSIX
> >>      (for example yesexpr "^[yY]" and noexpr "^[nN]"
> >>    - i18n.UTF-8 apparently tries to avoid English
> >>      (for example yesexpr  "^[+1]" and noexpr "^[-0]")
> >
> > What about error messages? This is probably off-topic, but it might be
> > nice if i18n used the actual errno macro names as strings ("ENOENT",
> > etc.) if it doesn't already.
> 
> There was nothing for error messages in the i18n file. Neither
> in C/POSIX.

OK. The reason I raise this is that I actually got several user
requests for musl to use the raw E* macro names rather than
descriptive English strings in the C locale. I don't think glibc would
want to make such a change in the C locale (and we probably wouldn't
in musl either), but the i18n locale might be a nice place to
experiment with it.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]