This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.


On Thu, 3 Dec 2015, Florian Weimer wrote:

> On 12/03/2015 10:44 PM, Rich Felker wrote:
> 
> > The relevant term is "Unicode Scalar Values", and these are exactly
> > the integers 0-0xd7ff and 0xe000-0x10ffff. UTF's assign a unique
> > encoding (in terms of code units) to each the Unicode Scalar Value,
> > and are not defined for any other integers. Likewise, UCS (16 or 32)
> > does not include values which are not Unicode Scalar Values.
> 
> The term Unicode Scalar Value did not exist when Unicode support was
> added to glibc.  For example, all the reference I have readily at hand
> (I can't find the 10646 CD right now) imply that UCS-4 in ISO/IEC
> 10646:2000 still had 31 bits and not the range restriction you gave.

My previous look at that question in 
<https://sourceware.org/ml/libc-alpha/2012-09/msg00112.html> indicates 
that the restriction was some time between 2008 and 2011.  I haven't gone 
further into SC2 documents to identify the time of the change further.

> The question is what glibc should doâimplement historic definitions,
> preserving the meaning of charset names for backwards compatibility, or
> tweak the implementations as the definitions evolve.

I think we need to implement the current meanings of those names.

-- 
Joseph S. Myers
joseph@codesourcery.com

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]