This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/2253] unicode combining accents can't be iconv-ed to latin (and others)
- From: "bugdal at aerifal dot cx" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sources dot redhat dot com
- Date: Sun, 06 May 2012 12:23:53 +0000
- Subject: [Bug libc/2253] unicode combining accents can't be iconv-ed to latin (and others)
- Auto-submitted: auto-generated
- References: <bug-2253-131@http.sourceware.org/bugzilla/>
http://sourceware.org/bugzilla/show_bug.cgi?id=2253
Rich Felker <bugdal at aerifal dot cx> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bugdal at aerifal dot cx
--- Comment #3 from Rich Felker <bugdal at aerifal dot cx> 2012-05-06 12:23:53 UTC ---
Please mark this bug as INVALID. This conversion definitely should NOT happen
unless //TRANSLIT is used. By default iconv should accurately reflect the
one-to-one nature of Unicode's round-trip mappings to legacy character sets.
This is important because many users of iconv will test performing a conversion
to a particular legacy character set to determine if the data can be faithfully
stored in that character set, e.g. for compression or transmission purposes. As
an example, mutt does this to choose the charset to send messages in, using a
user-provided list of charsets to try. I believe several IM clients also do it.
If iconv silently converted (for example) U+0061 U+0300 to U+00E0, such
applications would wrongly assume that this destructive conversion was
faithful, causing them to lose data. (That is, converting back would not
faithfully restore the original data, and the application should have just
stored it in the original UTF-8, but it had no way to know this because iconv
lied.)
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.