codeset problems in wprintf and wcsftime

Corinna Vinschen
Sat Apr 3 09:04:00 GMT 2010

On Apr  2 22:34, Andy Koppe wrote:
> Corinna Vinschen wrote:
> > On Apr  2 20:09, Andy Koppe wrote:
> >> I think there's another issue here. Since _ctype_locale_buf is meant
> >> to be filled from a text file in non-Cygwin systems, the mb_cur_max
> >> char would be an ASCII digit rather than a binary one, so that needs
> >> to be accounted for:
> >>
> >> return __get_current_ctype_locale ()->mb_cur_max[0] - '0';
> >
> > No.  See lmonetary.c.
> But that's got the M_ASSIGN_CHAR macro and cnv function for converting
> ASCII numbers in the locale file to binary. I can't see anything
> similar for mb_cur_max.

I'm talking about the mechanism to overwrite a string with the numerical
value which can be used here as well.  This isn't in the file yet since
no target uses this part of the code.  The code isn't even in the
repository yet, right?

> >  The global variables
> > __mbtowc and __wctomb are supposed to be stored in the locale_t
> > structure at one point as well, even on systems only supporting a single
> > locale_t.
> So why not treat __mb_cur_max and __lc_ctype_charset the same way?
> Global now, part of locale_t later.

I don't get that.  __lc_ctype_charset will go away for targets
supporting locales.  Even today, no other code accesses the charset
directly, but via the __locale_charset() function.  Consequentially we
have to do the same for __mb_cur_max.  It doesn't matter where
__mb_cur_max actually is, it's important that applications don't access
it as a variable, but only via a function call so that we can implement
a new scheme without affecting newly built applications.  And
__mb_cur_max, just like __lc_ctype_charset, will depend on the per
reent/per thread locale in future.  This requires to store them in the
locale_t structure somewhere.  My *current* solution is to store them in
the lc_ctype structure, which will become part of the locale_t
structure.  If that's good or bad is not important.  Important is, that
applications don't depend on the place where it's stored.

> >> I see, but it seems a shame to fit everything around the needs of the
> >> input function, rather than make the input function convert to a
> >> format that's best for the rest of the program. I fear those extra
> >> indirections will cancel out much of the work you previously did to
> >> speed up multibyte conversion.
> >
> > I don't think so.  Except for iso and cp functions, the other functions
> > don't need to access the locale info a lot.
> They may not actually need it, but the likes of mbrtowc() and
> wcrtomb() all call __locale_charset() anyway. And if that does an
> additional function call and pointer access, that's gonna make a
> difference.

What additional function call?  Each of these functions have access
to the reent context, which in turn will have a pointer to the locale_t,
which in turn stores the required information.  The __locale_charset ()
call will go away if we implement the charset enum as you propose.
All these calls get rid of that function call and one parameter when
calling the next function in the chain.  Eventually you're in __mbtowc
or __wctomb, which all don't need the charset anyway, except for
__iso_mbtowc/wctomb and __cp_mbtowc/wctomb.  And these functions will
have direct access to the charset table via reent->locale.  We can even
store the pointer to the charset conversion table for the iso and cp
functions right in the locale_t structure.  How is that going to slow
down things?  Quite the contrary!

> > Yes, that's broken.  __part_load_locale still needs some work, but I
> > didn't consider this important as long as nobody is actually using it.
> Alright, so newlib's on-disk locale format essentially is still undefined?

Basically it's a descendant of the BSD format, and it's evolving right
now.  We can still change things around, until the first target starts
to actually use that code.


Corinna Vinschen
Cygwin Project Co-Leader
Red Hat

More information about the Newlib mailing list