This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/19932] mbrtowc returns (size_t) -1 in C locale


https://sourceware.org/bugzilla/show_bug.cgi?id=19932

--- Comment #3 from Paul Eggert <eggert at gnu dot org> ---
(In reply to Bruno Haible from comment #2)
> Thus the mapping table would
> - map x (0 <= x <= 0x7F) to Unicode x,
> - map x (0x80 <= x <= 0xFF) to Unicode 0xDF80+x (or similar).

Emacs maps the latter to 0x3FFF80+x, I suppose under the theory that these
integers are not Unicode code points, and thus won't be conflated with
private-use Unicode characters. I suppose we could be "compatible" with Emacs.
Are there other examples in the wild of this sort of thing, or is the Emacs
precedent good enough?

> Should we create a new encoding with this property?
> Or change the mapping tables of ANSI_X3.4-1968?

It is a bit of a dilemma. Would it make sense to change iconv so that it
recognizes values like 0x3FFF80 as corresponding to encoding-error bytes? iconv
could then behave the same way as before, even if we change the mapping tables
of ANSI_X3.4-1968.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]