Hi,
while working on finalizing locale support for Cygwin it suddenly
occured to me that we have a problem in wprintf and wcsftime.
Let's assume a funny combination of localization variables in the user's
environment:
LANG=de_DE.utf8
LC_TIME=ja_JP.eucjp
LC_NUMERIC=en_US.iso88591
Yes, it's pretty unlikely, but nevertheless possible and valid.
So, at setlocale time we read and store the localized strings in the
codeset specified by the localization variable:
- __locale_charset() returns UTF-8
- __get_current_time_locale() returns data stored in EUC-JP
- __get_current_numeric_locale() returns data stored in ISO-8859-1
- localeconv() returns with decimal_point and
thousands_sep stored in ISO-8859-1,
and all other strings from the
LC_MONETARY category in UTF-8.
- nl_langinfo() CODESET is UTF-8,
strings from the LC_TIME category are
returned in EUC-JP,
strings from LC_MESSAGES are returned
in UTF-8
RADIXCHAR and THOUSEP are returned in
ISO-8859-1.
This is no problem at all as long as you call the multibyte variations
printf and strftime, the user gets what she asked for, and who are we
to ask the user for the reason behind this choice.
However, it is a problem in the wprintf and wcsftime functions. The
problem is that we have decimal_point, thousands_sep and all the LC_TIME
variables stored in some arbitrary multibyte codeset. Since we need the
widechar representation, wprintf and wcsftime have to convert the
strings using some mbtowc function. But the mbtowc functions always
assume the multibyte charset defined by __locale_charset().
Consequentially the conversion results in invalid strings.
AFAICS, there are two possible approaches to fix this problem:
- Store the charset not only for LC_CTYPE, but for each localization
category, and provide a function to request the charset.
This also requires to store the associated multibyte to widechar
conversion functions, obviously, and to call the correct functions
from wprintf and wcftime.
- Redefine the locale data structs so that they contain multibyte and
widechar representations of all strings. Use the multibyte strings
in the multibyte functions, the widechar strings in the widechar
functions.