Scripts tend to use LC_ALL=C.UTF-8 instead of LC_ALL=C for UTF-8 support and to behave in a locale-independent manner. However $LANGUAGE is still taken into account by glibc: xvii% LANGUAGE=fr_FR LC_ALL=C.UTF-8 cp cp: opérande de fichier manquant Saisissez « cp --help » pour plus d'informations. xvii% LANGUAGE=fr_FR LC_ALL=C cp cp: missing file operand Try 'cp --help' for more information. Both should have output in English. Glibc should apply the same rules with C.UTF-8 as with C locales. Also reported in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719590
There is no C.UTF-8 locale in glibc.
(In reply to Andreas Schwab from comment #1) > There is no C.UTF-8 locale in glibc. That's strange, because in the Subversion mailing-list, it was regarded as standard. Subversion works well only in UTF-8 locales, and the suggested solution was to use C.UTF-8: http://mail-archives.apache.org/mod_mbox/subversion-users/201307.mbox/%3C51DC54AD.7010601@wandisco.com%3E
I have filed bug #17318 requesting the inclusion of a C.UTF-8 locale in upstream glibc (actually prompted by https://bugzilla.redhat.com/show_bug.cgi?id=902094, but I found this bug while looking to see if anyone else had already made the request)
glibc doesn't provide a C.UTF-8, so any bug report about it makes no sense
While it's true that glibc itself doesn't provide a C.UTF-8 locale, does that really make this bug report invalid? The Debian-derived family of distros default to adding a C.UTF-8 locale at the distro level, but it doesn't quite work as expected, as it's missing some of the special casing afforded the default C locale. The specific one covered by this BZ is the face that LC_ALL=C will make glibc ignore the LANGUAGE setting, but LC_ALL=C.UTF-8 doesn't. Another possible way of phrasing the request would be for all "C.*" locales to ignore the LANGUAGE setting the same way the unmodified "C" locale does, rather than special casing "C.UTF-8". I'm not *personally* aware of any such locales in widespread use other than "C.UTF-8", but that doesn't mean there aren't any.
(In reply to Nick Coghlan from comment #5) bugs in distros aren't really the domain of glibc upstream. if you think the proposal in bug 17318 has limitations or you have concerns, you should post it there or the mailing list thread on the topic.
I filed #17318 because Fedora doesn't want to add C.UTF-8 independently of upstream glibc (at least in part to avoid inconsistencies like the one reported here). However, I also interpret the current bug closure as categorically rejecting the notion of treating C.UTF-8 the same as the C locale when it comes to the LANGUAGE variable, which doesn't seem like the correct outcome. If I've misunderstood what "CLOSED INVALID" means and the intent is for bug #17318 to include the behaviour requested here, then yes, I would consider that a reasonable way to resolve this issue.
glibc 2.35 has C.UTF-8 now, so it would make make sense reopen this.
Reopening as glibc 2.35 has C.UTF-8 (comment 8).
C is special-cased here: /* Ignore LANGUAGE and its system-dependent analogon if the locale is set to "C" because 1. "C" locale usually uses the ASCII encoding, and most international messages use non-ASCII characters. These characters get displayed as question marks (if using glibc's iconv()) or as invalid 8-bit characters (because other iconv()s refuse to convert most non-ASCII characters to ASCII). In any case, the output is ugly. 2. The precise output of some programs in the "C" locale is specified by POSIX and should not depend on environment variables like "LANGUAGE" or system-dependent information. We allow such programs to use gettext(). */ if (strcmp (locale, "C") == 0) return locale; It looks like the locale name is not embedded in the locale data itself, so identifying C.UTF-8 based on its name might not be so simple here.
(In reply to Florian Weimer from comment #10) > C is special-cased here: [...] > if (strcmp (locale, "C") == 0) > return locale; > > It looks like the locale name is not embedded in the locale data itself, so > identifying C.UTF-8 based on its name might not be so simple here. Do you mean that locale is not the string "C.UTF-8" (while setlocale() returns the expected "C.UTF-8")?
(In reply to Vincent Lefèvre from comment #11) > (In reply to Florian Weimer from comment #10) > > C is special-cased here: > [...] > > if (strcmp (locale, "C") == 0) > > return locale; > > > > It looks like the locale name is not embedded in the locale data itself, so > > identifying C.UTF-8 based on its name might not be so simple here. > > Do you mean that locale is not the string "C.UTF-8" (while setlocale() > returns the expected "C.UTF-8")? There are aliases such as "C.utf8", which we would have to recognize as well. Not doing that would make things worse, I think.
(In reply to Florian Weimer from comment #12) > There are aliases such as "C.utf8", which we would have to recognize as > well. Not doing that would make things worse, I think. OK, though I would say that this is mainly useful for scripts, which could always use "C.UTF-8" for better portability. BTW, Debian currently doesn't support aliases for "C.UTF-8" (at least by default).
*** Bug 29777 has been marked as a duplicate of this bug. ***
Fix pushed for 2.39: commit 2897b231a6b71ee17d47d3d63f1112b2641a476c Author: Bruno Haible <bruno@clisp.org> Date: Mon Sep 4 15:31:36 2023 +0200 intl: Treat C.UTF-8 locale like C locale (BZ# 16621) The wiki page https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 says that "Setting LC_ALL=C.UTF-8 will ignore LANGUAGE just like it does with LC_ALL=C." This patch implements it. * intl/dcigettext.c (guess_category_value): Treat C.<encoding> locale like the C locale. Reviewed-by: Florian Weimer <fweimer@redhat.com> I'm going to post my test, too.
The master branch has been updated by Florian Weimer <fw@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c52c2c32db15aba8bbe1a0b4d3235f97d9c1a525 commit c52c2c32db15aba8bbe1a0b4d3235f97d9c1a525 Author: Florian Weimer <fweimer@redhat.com> Date: Mon Nov 20 16:03:11 2023 +0100 intl: Add test case for bug 16621 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Additional fix for 2.39: commit d0aefec49941cf6d97e2244d6aa20bafc26d5942 Author: Bruno Haible <bruno@clisp.org> Date: Tue Dec 12 09:45:16 2023 +0100 intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621) The previous commit was incomplete: gettext() still returns a translation if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch prohibits the translation also in this case. * gettext-runtime/intl/dcigettext.c (DCIGETTEXT): Treat C.<encoding> locale like the C locale. Reviewed-by: Florian Weimer <fweimer@redhat.com>