This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale

From: Zack Weinberg <zackw at panix dot com>
To: Joseph Myers <joseph at codesourcery dot com>
Cc: "Carlos O'Donell" <carlos at redhat dot com>, Leonhard Holz <leonhard dot holz at web dot de>, GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 29 Mar 2016 18:17:26 -0400
Subject: Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
Authentication-results: sourceware.org; auth=none
References: <56D3F8F0 dot 8070401 at web dot de> <56FAA951 dot 7020000 at redhat dot com> <alpine dot DEB dot 2 dot 10 dot 1603292143440 dot 15654 at digraph dot polyomino dot org dot uk>

On Tue, Mar 29, 2016 at 6:05 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Tue, 29 Mar 2016, Carlos O'Donell wrote:
>
>> I believe this is technically inaccurate since it allows all 4-byte
>> sequences, when in reality the limit is at U+10FFFF?
>
> That glibc accepts UTF-8 according to the definition in the 2003 edition
> of ISO 10646 rather than the definition in the 2011 and later editions is
> a known issue.  I've filed bug 19883 for it since I couldn't find an
> existing bug report in Bugzilla.

Note that the U+10FFFF limit equates to a Y2541 bug, at the present
(post-2000) rate of codepoint assignment.  See
https://gist.github.com/zackw/f2e74a8d7b31baa88002 for calculations
and pretty graph.

zw

References:
- Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
  - From: Carlos O'Donell
- Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]