This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.

From: Joseph Myers <joseph at codesourcery dot com>
To: Florian Weimer <fweimer at redhat dot com>
Cc: Rich Felker <dalias at libc dot org>, Stefan Liebler <stli at linux dot vnet dot ibm dot com>, Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Thu, 3 Dec 2015 22:49:59 +0000
Subject: Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.
Authentication-results: sourceware.org; auth=none
References: <565EDF7C dot 9020808 at linux dot vnet dot ibm dot com> <565EE434 dot 4090205 at redhat dot com> <20151203214417 dot GZ3818 at brightrain dot aerifal dot cx> <5660C328 dot 5040207 at redhat dot com>

On Thu, 3 Dec 2015, Florian Weimer wrote:

> On 12/03/2015 10:44 PM, Rich Felker wrote:
> 
> > The relevant term is "Unicode Scalar Values", and these are exactly
> > the integers 0-0xd7ff and 0xe000-0x10ffff. UTF's assign a unique
> > encoding (in terms of code units) to each the Unicode Scalar Value,
> > and are not defined for any other integers. Likewise, UCS (16 or 32)
> > does not include values which are not Unicode Scalar Values.
> 
> The term Unicode Scalar Value did not exist when Unicode support was
> added to glibc.  For example, all the reference I have readily at hand
> (I can't find the 10646 CD right now) imply that UCS-4 in ISO/IEC
> 10646:2000 still had 31 bits and not the range restriction you gave.

My previous look at that question in 
<https://sourceware.org/ml/libc-alpha/2012-09/msg00112.html> indicates 
that the restriction was some time between 2008 and 2011.  I haven't gone 
further into SC2 documents to identify the time of the change further.

> The question is what glibc should doâimplement historic definitions,
> preserving the meaning of charset names for backwards compatibility, or
> tweak the implementations as the definitions evolve.

I think we need to implement the current meanings of those names.

-- 
Joseph S. Myers
joseph@codesourcery.com

References:
- Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.
  - From: Florian Weimer
- Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.
  - From: Rich Felker
- Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]