Bug 17318 - [RFE] Provide a C.UTF-8 locale by default
Summary: [RFE] Provide a C.UTF-8 locale by default
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 16621
  Show dependency treegraph
 
Reported: 2014-08-27 12:57 UTC by Nick Coghlan
Modified: 2017-01-31 12:59 UTC (History)
11 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Coghlan 2014-08-27 12:57:32 UTC
Fedora doesn't currently provide the C.UTF-8 locale. In the RFE requesting it (https://bugzilla.redhat.com/show_bug.cgi?id=902094), it was suggested that a more appropriate would be for it to be provided as part of upstream glibc, at which point Fedora would inherit it by default.

Hence, this RFE to request the inclusion of a C.UTF-8 locale by default.

My personal interest relates to Python 3, where "LANG=C" misconfigures a few aspects to use ASCII, when they really should be using UTF-8. While I'd actually like to fix that on the Python side in the long run, being able to set "LANG=C.UTF-8" instead is a solution that already works for existing versions of Python 3.

Bug #16621 suggests that C.UTF-8 may actually require special casing in glibc in order to be handled correctly. If that's accurate, then it would strengthen the case for including the locale in the upstream library.
Comment 1 Carlos O'Donell 2015-02-11 15:39:27 UTC
(In reply to Nick Coghlan from comment #0)
> Fedora doesn't currently provide the C.UTF-8 locale. In the RFE requesting
> it (https://bugzilla.redhat.com/show_bug.cgi?id=902094), it was suggested
> that a more appropriate would be for it to be provided as part of upstream
> glibc, at which point Fedora would inherit it by default.
> 
> Hence, this RFE to request the inclusion of a C.UTF-8 locale by default.
> 
> My personal interest relates to Python 3, where "LANG=C" misconfigures a few
> aspects to use ASCII, when they really should be using UTF-8. While I'd
> actually like to fix that on the Python side in the long run, being able to
> set "LANG=C.UTF-8" instead is a solution that already works for existing
> versions of Python 3.
> 
> Bug #16621 suggests that C.UTF-8 may actually require special casing in
> glibc in order to be handled correctly. If that's accurate, then it would
> strengthen the case for including the locale in the upstream library.

I agree that this is a good idea. Someone needs to do the work and submit it to libc-alpha. It's not all that easy, and consensus needs to be reached about the inclusion of ~1.5MB of UTF-8 data into the runtime.
Comment 2 Nick Coghlan 2015-02-25 23:02:06 UTC
Reference to the glic-alpha mailing list discussion with additional technical details: https://sourceware.org/ml/libc-alpha/2015-02/msg00247.html