This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Why does iconv signal EILSEQ whith legal sequences (deviation from standard?)
- From: Alexander Shopov <ash at contact dot bg>
- To: libc-alpha <libc-alpha at sourceware dot org>
- Date: Thu, 18 May 2006 17:08:31 +0300
- Subject: Why does iconv signal EILSEQ whith legal sequences (deviation from standard?)
Hi guys,
I get the following error when trying to convert between charsets:
-----------------------------------------------
[ash@Stamat ~]$ env | grep -i LANG
LANG=bg_BG.UTF-8
[ash@Stamat ~]$ echo ÃÃ > check # two accented a's
[ash@Stamat ~]$ iconv -f UTF-8 -t ASCII check
iconv: illegal input sequence at position 0
[ash@Stamat ~]$
-----------------------------------------------
Looking at libc/iconv/iconv_prog.c
I see that the message has been sent because the EILSEQ error has been
raised.
1. The GNU libc manual page states that:
http://www.gnu.org/software/libc/manual/html_node/Generic-Conversion-Interface.html#Generic-Conversion-Interface
EILSEQ
The conversion stopped because of an invalid byte sequence in
the input. After the call, *inbuf points at the first byte of
the invalid byte sequence.
2. The Single UNIX Â Specification, Version 2
Copyright  1997 The Open Group
http://www.opengroup.org/onlinepubs/007908799/xsh/iconv.html
What is written is:
[EILSEQ]
Input conversion stopped due to an input byte that does not
belong to the input codeset.
If you look several paragraphs above - you get to:
If iconv() encounters a character in the input buffer that is valid, but
for which an identical character does not exist in the target codeset,
iconv() performs an implementation-dependent conversion on this
character.
3. ISO C Amendment 1 (MSE)
http://www.unix.org/version2/whatsnew/login_mse.html
EILSEQ
A invalid wide-character encoding, or a sequence of bytes which do not
form a valid multibyte character, was encountered.
4. The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_03.html
Now - I tried to convert valid UTF-8. Why has EILSEQ been raised?
Kind regards:
al_shopov