This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: [PATCH] Remove 0x005C conversion from __jisx0208_from_ucs4_lat1 for ISO-2022-JP
- From: GOTO Masanori <gotom at debian dot or dot jp>
- To: Ulrich Drepper <drepper at redhat dot com>
- Cc: GOTO Masanori <gotom at debian dot or dot jp>,libc-alpha at sources dot redhat dot com, ukai at debian dot or dot jp
- Date: Fri, 10 Sep 2004 12:17:01 +0900
- Subject: Re: [PATCH] Remove 0x005C conversion from __jisx0208_from_ucs4_lat1 for ISO-2022-JP
- References: <811xib64v6.wl@omega.webmasters.gr.jp><81vfenqv1u.wl@omega.webmasters.gr.jp><41408EE1.3050402@redhat.com>
At Thu, 09 Sep 2004 10:12:01 -0700,
Ulrich Drepper wrote:
> The implemented behavior has been added by default and changing this
> will break code. Since I did not decide on this myself back when there
> are definitely opposite sides on this issue within the group of people
> affected.
Thanks for your reply. I have not known that the patch was requested
on demand. I investigated jis0208.c cvs diff and cvs log, but it
seemed this part was not modified from the first check in.
Do you remember who requested this part? I guess they had some
reasons (like EUC-JP/SJIS reversible), so I would like to contact and
discuss this problem. I would like to know the reason.
> In this case it is better to be conservative and not change
> anything.
In this case, the character \ reverse solidus is modified between
round-trip (even ISO-2022-JP <-> ISO-2022-JP). I wonder why it's
acceptable conversion. There're some irreversible round-trip in
EUC-JP and SJIS. However this patch focuses only for ISO-2022-JP
specific. I'm in the hope that the original author, you, reread the
patch again.
> You will have to do much better than just saying "some people
> complained". You have to show that changing this does break anything
> significantly _and_ that people cannot live without this change (despite
> the existing behavior being in place for 7 years now).
The actual example raised this problem was especially some mail user
agents (sylpheed and mutt) and IRC clients (xchat) that use iconv().
We usually use EUC-JP in unix environment (and SJIS in some
environment known as Windows and Macintosh). ISO-2022-JP is used
especially in emacs, HTML, IRC and email (RFC defines ISO-2022-JP
should be used in Japanese mail. It's also popular to use ISO-2022-JP
in Japanese IRC).
But emacsen does not use iconv(). SJIS/EUC-JP can be used and they're
becoming majority in Japanese HTML. Until a few years, the major
mailer and IRC client in unix environment were emacsen based (like
Mew, Wanderlust, liece, irchat-jp, and so on). They also don't use
ISO-2022-JP glibc iconv() function. (Note that I'm emacsen based
user, so I hardly see this kind of problem on my usual environment.)
Moreover, this problem is occured only the sequence "A\b" (where A is
JISX0208 and b is ISO-646). It's appeared rarely in Japanese text.
One recent example is appeared on mutt. When mutt user want to send
.po file that includes text like: msgstr "AAA\n". However, the mail
becomes to "AAAcn" where c is fullwidth reverse solidus. The reverse
solidus \ is undesirably changed and .po format becomes being broken
without any reasons.
So I'm not surprised even it has been existed for 7 years.
Regards,
-- gotom