codeset problems in wprintf and wcsftime

Corinna Vinschen
Fri Apr 2 21:03:00 GMT 2010

On Apr  2 20:09, Andy Koppe wrote:
> Corinna Vinschen:
> > That should have been
> >
> >  return __get_current_ctype_locale ()->mb_cur_max[0];
> >
> > I've fixed that locally.
> I think there's another issue here. Since _ctype_locale_buf is meant
> to be filled from a text file in non-Cygwin systems, the mb_cur_max
> char would be an ASCII digit rather than a binary one, so that needs
> to be accounted for:
> return __get_current_ctype_locale ()->mb_cur_max[0] - '0';

No.  See lmonetary.c.
> And one more: __mb_cur_max is still being defined for systems with
> __HAVE_LOCALE_INFO__, even though they'd no longer use it. Although,
> since we're still using the global __mbtowc and __wctomb variables,
> can't we just continue to use __mb_cur_max as well as __locale_charset
> rather than go through __get_current_ctype_locale()?

__mb_cur_max is defined for *all* systems.  It has to be kept in sync
for applications which have been built using the old definition of
MB_CUR_MAX.  That's the entire reason for it.  The global variables
__mbtowc and __wctomb are supposed to be stored in the locale_t
structure at one point as well, even on systems only supporting a single
locale_t.  The entire locale_t stuff isn't entirely thought out yet, but
basically the global variables will only be used by systems which don't
support __HAVE_LOCALE_INFO__ at all.

> > No, that spoils the way the data is read from locale files.  For
> > Cygwin that would be no problem, but it doesn't match the way the
> > __part_load_locale function works.
> I see, but it seems a shame to fit everything around the needs of the
> input function, rather than make the input function convert to a
> format that's best for the rest of the program. I fear those extra
> indirections will cancel out much of the work you previously did to
> speed up multibyte conversion.

I don't think so.  Except for iso and cp functions, the other functions
don't need to access the locale info a lot.  Only the function pointers
are fetched from there eventually, but as my speedup changes showed, the
worst thing are the function calls itself, at least on ix86.

> Regarding __part_load_locale(), I don't get the bit about the cache,
> particularly:
> 	/*
> 	 * Record the successful parse in the cache.
> 	 */
> 	locale_buf = lbuf;
> locale_buf is a parameter, i.e. a local variable, so that assignment
> will be forgotten the next time we get to __part_load_locale.

Yes, that's broken.  __part_load_locale still needs some work, but I
didn't consider this important as long as nobody is actually using it.

> >> The __ctype_locale_buf also holds the name of the selected charset,
> >> and I don't like the charset and mb_cur_max being mixed up in that
> >> way, and I think this is really ugly:
> >>
> >> /* Max encoding_len + NUL byte + 1 byte mb_cur_max plus trailing NUL byte */
> >> #define _CTYPE_BUF_SIZE       34
> How about:

Sure, fine with me.  I fixed that locally.

> > Again, that's how __part_load_locale works.  Compare with BSD.
> You're right, that scheme does make a lot of sense for all the other
> locale categories, which actually do consist of strings and which
> aren't terribly performance-critical.

lmonetary does not consist of strings in the first place.

> Multibyte conversion, however, needs to be as efficient as possible
> because many programs pump a lot of data through those functions. And
> of course it also impacts the performance of Cygwin's filesystem.
> Hence I think making the charset name and mb_cur_max more directly
> accessible might make quite a difference.

As I said before, the main problem are the lot of function calls with
many arguments.  The locale_t information will be accessible via the
reent structure, so in the long run we can get rid of at least one
additional argument to the functions again.  This will help the
performance more than the extra indirection will hit it, as far as I can

> If it is necessary, then how about replacing the _ctype_using_locale
> variable with a pointer that points to either _ctype_locale or
> _C_ctype_locale? That would at least eliminate a branch.

That's a good idea.


Corinna Vinschen
Cygwin Project Co-Leader
Red Hat

More information about the Newlib mailing list