This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale

From: Joseph Myers <joseph at codesourcery dot com>
To: Carlos O'Donell <carlos at redhat dot com>
Cc: Leonhard Holz <leonhard dot holz at web dot de>, GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 29 Mar 2016 22:05:28 +0000
Subject: Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
Authentication-results: sourceware.org; auth=none
References: <56D3F8F0 dot 8070401 at web dot de> <56FAA951 dot 7020000 at redhat dot com>

On Tue, 29 Mar 2016, Carlos O'Donell wrote:

> I believe this is technically inaccurate since it allows all 4-byte
> sequences, when in reality the limit is at U+10FFFF?

That glibc accepts UTF-8 according to the definition in the 2003 edition 
of ISO 10646 rather than the definition in the 2011 and later editions is 
a known issue.  I've filed bug 19883 for it since I couldn't find an 
existing bug report in Bugzilla.  I don't think it's particularly relevant 
to any patch not aiming to fix that bug, but:

> You need not fix it, but we should add a comment saying that for the
> sake of simpler code we're allowing those 4-byte sequences which are
> not normally accepted.

I'd think a reference to this code in bug 19883 might be more useful - or 
something in that bug giving a standard (greppable) wording for a comment 
identifying places needing updating for the current UTF-8 (or in some 
cases UCS-4) definition, with such a comment added in this code.

-- 
Joseph S. Myers
joseph@codesourcery.com

Follow-Ups:
- Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
  - From: Zack Weinberg

References:
- Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]