This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Leonhard Holz <leonhard dot holz at web dot de>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 29 Mar 2016 22:05:28 +0000
- Subject: Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
- Authentication-results: sourceware.org; auth=none
- References: <56D3F8F0 dot 8070401 at web dot de> <56FAA951 dot 7020000 at redhat dot com>
On Tue, 29 Mar 2016, Carlos O'Donell wrote:
> I believe this is technically inaccurate since it allows all 4-byte
> sequences, when in reality the limit is at U+10FFFF?
That glibc accepts UTF-8 according to the definition in the 2003 edition
of ISO 10646 rather than the definition in the 2011 and later editions is
a known issue. I've filed bug 19883 for it since I couldn't find an
existing bug report in Bugzilla. I don't think it's particularly relevant
to any patch not aiming to fix that bug, but:
> You need not fix it, but we should add a comment saying that for the
> sake of simpler code we're allowing those 4-byte sequences which are
> not normally accepted.
I'd think a reference to this code in bug 19883 might be more useful - or
something in that bug giving a standard (greppable) wording for a comment
identifying places needing updating for the current UTF-8 (or in some
cases UCS-4) definition, with such a comment added in this code.
--
Joseph S. Myers
joseph@codesourcery.com