codeset problems in wprintf and wcsftime

Andy Koppe
Fri Apr 2 21:00:00 GMT 2010

Corinna Vinschen:
> That should have been
>  return __get_current_ctype_locale ()->mb_cur_max[0];
> I've fixed that locally.

I think there's another issue here. Since _ctype_locale_buf is meant
to be filled from a text file in non-Cygwin systems, the mb_cur_max
char would be an ASCII digit rather than a binary one, so that needs
to be accounted for:

return __get_current_ctype_locale ()->mb_cur_max[0] - '0';

And one more: __mb_cur_max is still being defined for systems with
__HAVE_LOCALE_INFO__, even though they'd no longer use it. Although,
since we're still using the global __mbtowc and __wctomb variables,
can't we just continue to use __mb_cur_max as well as __locale_charset
rather than go through __get_current_ctype_locale()?

>> But I'm puzzled about why that indirection into _ctype_locale_buf is
>> there in the first place. Why not just store mb_cur_max directly in
>> the lc_ctype_T struct?
> No, that spoils the way the data is read from locale files.  For
> Cygwin that would be no problem, but it doesn't match the way the
> __part_load_locale function works.

I see, but it seems a shame to fit everything around the needs of the
input function, rather than make the input function convert to a
format that's best for the rest of the program. I fear those extra
indirections will cancel out much of the work you previously did to
speed up multibyte conversion.

Regarding __part_load_locale(), I don't get the bit about the cache,

	 * Record the successful parse in the cache.
	locale_buf = lbuf;

locale_buf is a parameter, i.e. a local variable, so that assignment
will be forgotten the next time we get to __part_load_locale.

>> The __ctype_locale_buf also holds the name of the selected charset,
>> and I don't like the charset and mb_cur_max being mixed up in that
>> way, and I think this is really ugly:
>> /* Max encoding_len + NUL byte + 1 byte mb_cur_max plus trailing NUL byte */
>> #define _CTYPE_BUF_SIZE       34

How about:

>> static char _ctype_locale_buf[_CTYPE_BUF_SIZE];
>> What's the idea behind that? And indeed, why not store the codeset
>> string directly in an array in lc_ctype_T?
> Again, that's how __part_load_locale works.  Compare with BSD.

You're right, that scheme does make a lot of sense for all the other
locale categories, which actually do consist of strings and which
aren't terribly performance-critical.

Multibyte conversion, however, needs to be as efficient as possible
because many programs pump a lot of data through those functions. And
of course it also impacts the performance of Cygwin's filesystem.
Hence I think making the charset name and mb_cur_max more directly
accessible might make quite a difference.

>> struct lc_ctype_T *
>> __get_current_ctype_locale(void) {
>>       return (_ctype_using_locale
>>               ? &_ctype_locale
>>               : (struct lc_ctype_T *)&_C_ctype_locale);
>> }
>> Checking whether we're in the "C" locale every single time seems
>> rather inefficient. Wouldn't it be better to initialise _ctype_locale
>> with the contents _C_ctype_locale and later copy it in again when the
>> C locale is selected?
> That's exactly how it's implemented in the other files, like
> lmonetary.c, etc.  There is no comparison with the "C" locale, rather
> it just depends on the _ctype_using_locale variable.The idea (from BSD)
> is that the function always returns a valid structure, even while the
> __part_load_locale function is changing the contents of _ctype_locale_buf.

I'm not convinced that this is necessary. If locale-dependent
functions are used while another thread is calling setlocale(), random
(i.e. undefined) behaviour has to be expected. If anything, one would
expect either the previous or the new locale to be used, but not the C

If it is necessary, then how about replacing the _ctype_using_locale
variable with a pointer that points to either _ctype_locale or
_C_ctype_locale? That would at least eliminate a branch.


More information about the Newlib mailing list