This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/19932] mbrtowc returns (size_t) -1 in C locale

From: "eggert at gnu dot org" <sourceware-bugzilla at sourceware dot org>
To: libc-locales at sourceware dot org
Date: Sat, 09 Apr 2016 17:56:51 +0000
Subject: [Bug localedata/19932] mbrtowc returns (size_t) -1 in C locale
Auto-submitted: auto-generated
References: <bug-19932-716 at http dot sourceware dot org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=19932

--- Comment #3 from Paul Eggert <eggert at gnu dot org> ---
(In reply to Bruno Haible from comment #2)
> Thus the mapping table would
> - map x (0 <= x <= 0x7F) to Unicode x,
> - map x (0x80 <= x <= 0xFF) to Unicode 0xDF80+x (or similar).

Emacs maps the latter to 0x3FFF80+x, I suppose under the theory that these
integers are not Unicode code points, and thus won't be conflated with
private-use Unicode characters. I suppose we could be "compatible" with Emacs.
Are there other examples in the wild of this sort of thing, or is the Emacs
precedent good enough?

> Should we create a new encoding with this property?
> Or change the mapping tables of ANSI_X3.4-1968?

It is a bit of a dilemma. Would it make sense to change iconv so that it
recognizes values like 0x3FFF80 as corresponding to encoding-error bytes? iconv
could then behave the same way as before, even if we change the mapping tables
of ANSI_X3.4-1968.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

References:
- [Bug localedata/19932] New: mbrtowc returns (size_t) -1 in C locale
  - From: eggert at gnu dot org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]