This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
- From: Zack Weinberg <zackw at panix dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: "Carlos O'Donell" <carlos at redhat dot com>, Leonhard Holz <leonhard dot holz at web dot de>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 29 Mar 2016 18:17:26 -0400
- Subject: Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
- Authentication-results: sourceware.org; auth=none
- References: <56D3F8F0 dot 8070401 at web dot de> <56FAA951 dot 7020000 at redhat dot com> <alpine dot DEB dot 2 dot 10 dot 1603292143440 dot 15654 at digraph dot polyomino dot org dot uk>
On Tue, Mar 29, 2016 at 6:05 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Tue, 29 Mar 2016, Carlos O'Donell wrote:
>
>> I believe this is technically inaccurate since it allows all 4-byte
>> sequences, when in reality the limit is at U+10FFFF?
>
> That glibc accepts UTF-8 according to the definition in the 2003 edition
> of ISO 10646 rather than the definition in the 2011 and later editions is
> a known issue. I've filed bug 19883 for it since I couldn't find an
> existing bug report in Bugzilla.
Note that the U+10FFFF limit equates to a Y2541 bug, at the present
(post-2000) rate of codepoint assignment. See
https://gist.github.com/zackw/f2e74a8d7b31baa88002 for calculations
and pretty graph.
zw