This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/2253] unicode combining accents can't be iconv-ed to latin (and others)


http://sourceware.org/bugzilla/show_bug.cgi?id=2253

Rich Felker <bugdal at aerifal dot cx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugdal at aerifal dot cx

--- Comment #3 from Rich Felker <bugdal at aerifal dot cx> 2012-05-06 12:23:53 UTC ---
Please mark this bug as INVALID. This conversion definitely should NOT happen
unless //TRANSLIT is used. By default iconv should accurately reflect the
one-to-one nature of Unicode's round-trip mappings to legacy character sets.
This is important because many users of iconv will test performing a conversion
to a particular legacy character set to determine if the data can be faithfully
stored in that character set, e.g. for compression or transmission purposes. As
an example, mutt does this to choose the charset to send messages in, using a
user-provided list of charsets to try. I believe several IM clients also do it.
If iconv silently converted (for example) U+0061 U+0300 to U+00E0, such
applications would wrongly assume that this destructive conversion was
faithful, causing them to lose data. (That is, converting back would not
faithfully restore the original data, and the application should have just
stored it in the original UTF-8, but it had no way to know this because iconv
lied.)

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]