Bug 14008 - iconv behavior is incorrect when character is not present in dest charset
Summary: iconv behavior is incorrect when character is not present in dest charset
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-21 19:24 UTC by Rich Felker
Modified: 2016-08-26 14:06 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Test case showing the bug (206 bytes, text/x-csrc)
2012-04-21 19:24 UTC, Rich Felker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Felker 2012-04-21 19:24:45 UTC
Created attachment 6358 [details]
Test case showing the bug

Per POSIX:

"If iconv() encounters a character in the input buffer that is valid, but for which an identical character does not exist in the target codeset, iconv() shall perform an implementation-defined conversion on this character."

And:

"The iconv() function shall update the variables pointed to by the arguments to reflect the extent of the conversion and return the number of non-identical conversions performed. If the entire string in the input buffer is converted, the value pointed to by inbytesleft shall be 0. If the input conversion is stopped due to any conditions mentioned above, the value pointed to by inbytesleft shall be non-zero and errno shall be set to indicate the condition. If an error occurs, iconv() shall return (size_t)-1 and set errno to indicate the error."

However, glibc's iconv is buggy and returns (size_t)-1 when a character from the input character set does not exist in the output character set. I am attaching a simple test program that shows the issue, based on incorrect test code I found in glib:

https://bugzilla.gnome.org/show_bug.cgi?id=674540
Comment 1 Milan Bouchet-Valat 2016-02-06 15:57:04 UTC
This is indeed highly annoying. At least, the documentation should be updated to mention what error code is used to signal that a sequence that cannot be represented in the target encoding has been encountered (this is EILSEQ). It should also mention that glibc does not comply with POSIX on that point.

The documentation is also not clear when it says "If all input from the input buffer is successfully converted and stored in the output buffer, the function returns the number of non-reversible conversions performed." [1] Indeed, since it is said that sequences that cannot be represented in the target encoding trigger an error, they won't affect the return code. FWIX, POSIX says "non-identical conversions" instead of "non-reversible".

Finally, the part saying "future versions will provide better ones, but they are not yet finished"[1] could also be removed, as I guess backward-compatibility will be preserved, won't it?


1: http://www.gnu.org/software/libc/manual/html_node/Generic-Conversion-Interface.html