This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: Rich Felker <dalias at libc dot org>, Stefan Liebler <stli at linux dot vnet dot ibm dot com>, Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 3 Dec 2015 22:49:59 +0000
- Subject: Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.
- Authentication-results: sourceware.org; auth=none
- References: <565EDF7C dot 9020808 at linux dot vnet dot ibm dot com> <565EE434 dot 4090205 at redhat dot com> <20151203214417 dot GZ3818 at brightrain dot aerifal dot cx> <5660C328 dot 5040207 at redhat dot com>
On Thu, 3 Dec 2015, Florian Weimer wrote:
> On 12/03/2015 10:44 PM, Rich Felker wrote:
>
> > The relevant term is "Unicode Scalar Values", and these are exactly
> > the integers 0-0xd7ff and 0xe000-0x10ffff. UTF's assign a unique
> > encoding (in terms of code units) to each the Unicode Scalar Value,
> > and are not defined for any other integers. Likewise, UCS (16 or 32)
> > does not include values which are not Unicode Scalar Values.
>
> The term Unicode Scalar Value did not exist when Unicode support was
> added to glibc. For example, all the reference I have readily at hand
> (I can't find the 10646 CD right now) imply that UCS-4 in ISO/IEC
> 10646:2000 still had 31 bits and not the range restriction you gave.
My previous look at that question in
<https://sourceware.org/ml/libc-alpha/2012-09/msg00112.html> indicates
that the restriction was some time between 2008 and 2011. I haven't gone
further into SC2 documents to identify the time of the change further.
> The question is what glibc should doâimplement historic definitions,
> preserving the meaning of charset names for backwards compatibility, or
> tweak the implementations as the definitions evolve.
I think we need to implement the current meanings of those names.
--
Joseph S. Myers
joseph@codesourcery.com