This is the mail archive of the
mailing list for the glibc project.
Re: regression caused by fix of bug #13691
> > what is the fallback/alternate locale that should be used instead?
Yes, right. In fact, Vietnamese users were among the first to embrace
UTF-8 locales, between 2001 and 2004.
> the source and testsuite appear to indicate that
> state-dependent and stateless encodings are supported, are the
> comments wrong or am I misunderstanding something?
Stateful encodings are supported by the iconv() API. This API has a way
for the application to tell the converter "the end of the input string is
reached, please kick all pending output to the output buffers". It is
this notification which allows the converter to hold off from producing
output immediately in case of ambiguities.
For mbrtowc() such a notification does not exist, and since mbrtowc() is
the central function for multi-byte processing in locales, stateful locale
encodings cannot be supported (unless you make them stateless by producing
decomposed Unicode or private-use-area characters).
The references to the word 'stateful' in wcsmbs/wcsmbsload.c date back to
1999, at the beginning of the locale support in glibc; the major work was
done in 2000-2001.
> What is the effect of
> removing a locale from localedata/SUPPORTED? Is it still installed?
> Is it still available for use with iconv but remains broken. I would like
> to understand, as Ryan does too, the implications of your patch.
Distributors normally don't use the localedata/SUPPORTED file from glibc;
they have their own list, AFAIK. Some distributors add locales for more
countries. Other distributors install only UTF-8 locales...
For those people who "make localedata/install-locales", the removal
from localedata/SUPPORTED has the effect that "locale -a" won't display
this locale any more, and setlocale() calls for this locale will fail.
It will not have an effect on iconv() - the list of converters usable
by iconv() is determined through the file iconvdata/gconv-modules.
> - It was mentioned in  that vi_VN.TCVN is the only state-dependent
> character encoding being used by Debian and that it's broken.
Yup. I agree with the findings. Goto Masanori's workaround is a workaround:
it tells mbrtowc() to return to initial state by appending a NUL character
to the input. But programs which use mbrtowc() generally don't do this.
While mbstowcs() assumes a NUL terminated input string, a sequence of calls
to mbrtowc() does not assume that.
> - LFS notes in  that vi_VN.TCVN is broken and removes it from
> - The FreeDesktop standard in  says GLIBC doesn't support vi_VN.TCVN.
Thanks for having dug out these additional witnesses.