This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: iconv_open behaviour on EILSEQ





On Sun, 5 May 2002, Stefan Hoffmeister wrote:

> : On Sun, 5 May 2002 00:58:54 -0700 (PDT), Paul Eggert wrote:
>
> >> Date: Sun, 5 May 2002 00:53:04 -0400 (EDT)
> >> From: Jungshik Shin <jshin@mailaps.org>
> >>
> >> What is it supposed to do when it encounters a *valid*
> >> byte sequence in the specified source codeset which cannot be converted
> >> to the specified target codeset.
> >
> >POSIX 1003.1-2001 says it "shall perform an implementation-defined
> >conversion on this character."


> ... which does not happen in the iconv() conversion of glibc 2.2.4 (SuSE
> 7.3).

> Example: When converting Unicode to locale-specific data (ISO 8559-1,
> for instance), conversion stops with a return value of -1 and EILSEQ if
> the Unicode character in question cannot be converted into a
> locale-specific character (try some cyrillic input).

  As I wrote in my email, this is what's done by the current
implementation of iconv(3) in glibc 2.2.x. That is,
an invalid byte seq.  and a valid but unconvertible byte
sequence is treated exactly the same.

> Comparing current iconv() behaviour to how Microsoft implemented and
> documented WideCharToMultiByte()
>
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_2bj9.asp
>
> it seems that iconv() behaviour is not in line with POSIX 1003.1-2001.

  Writing a function similar to WideCharToMultiByte() would not be
hard with the current implmentation of iconv(3) in glibc 2.2.x.
However, it would be rather difficult to do that if iconv(3) does
implementation-defined conversion (whatever it is) as specfied by POSIX
2001.  For instance, if '?' is used as a replacement character, there's
no way for a callee to tell whether '?' in the output is the replacement
character for unconvertible characters or genuinely present in the input.
I wish POSIX had stipulated that iconv(3) return -1 and set errno to Exxxx
which is not EILSEQ. (or return -2 and set errno to EILSEQ if defining
another errno is not desirable). With that, it'd be clear to a callee
whether it is an invalid byte sequence or a valid but unconvertible byte
sequence that makes iconv(3) stop. And, if it's necessary, one can write
up a wrapper function similar to WideCharToMultiByte().

   To me, the current implementation (which doesn't appear
compliant to POSIX 2001) is more convenient than what POSIX 2001 says
iconv(3) has to do, but it'd be even better if there's a way to tell
whether it's an invalid byte seq. or a valid but unconvertible byte seq.

  BTW, is '//TRANSLIT'  documented in POSIX 2001 or is it
a glibc (and libiconv) extension?

   Jungshik Shin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]