This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Why does iconv signal EILSEQ whith legal sequences (deviation from standard?)


Hi guys,

I get the following error when trying to convert between charsets:
-----------------------------------------------
[ash@Stamat ~]$ env | grep -i LANG
LANG=bg_BG.UTF-8
[ash@Stamat ~]$ echo ÃÃ > check # two accented a's
[ash@Stamat ~]$ iconv -f UTF-8 -t ASCII check
iconv: illegal input sequence at position 0
[ash@Stamat ~]$
-----------------------------------------------

Looking at libc/iconv/iconv_prog.c

I see that the message has been sent because the EILSEQ error has been
raised.

1. The GNU libc manual page states that:
http://www.gnu.org/software/libc/manual/html_node/Generic-Conversion-Interface.html#Generic-Conversion-Interface
EILSEQ
        The conversion stopped because of an invalid byte sequence in
        the input. After the call, *inbuf points at the first byte of
        the invalid byte sequence. 

2. The Single UNIX Â Specification, Version 2
Copyright  1997 The Open Group

http://www.opengroup.org/onlinepubs/007908799/xsh/iconv.html

What is written is:
[EILSEQ]
        Input conversion stopped due to an input byte that does not
        belong to the input codeset.
        
If you look several paragraphs above - you get to:

If iconv() encounters a character in the input buffer that is valid, but
for which an identical character does not exist in the target codeset,
iconv() performs an implementation-dependent conversion on this
character.

3. ISO C Amendment 1 (MSE)
http://www.unix.org/version2/whatsnew/login_mse.html
EILSEQ 

A invalid wide-character encoding, or a sequence of bytes which do not
form a valid multibyte character, was encountered.

4. The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_03.html


Now - I tried to convert valid UTF-8. Why has EILSEQ been raised?

Kind regards:
al_shopov


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]