This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: [PATCH] Remove 0x005C conversion from __jisx0208_from_ucs4_lat1 for ISO-2022-JP
- From: Fumitoshi UKAI <ukai at debian dot or dot jp>
- To: Ulrich Drepper <drepper at redhat dot com>
- Cc: GOTO Masanori <gotom at debian dot or dot jp>,libc-alpha at sources dot redhat dot com, ukai at debian dot or dot jp
- Date: Sat, 11 Sep 2004 15:19:58 +0900
- Subject: Re: [PATCH] Remove 0x005C conversion from __jisx0208_from_ucs4_lat1 for ISO-2022-JP
- Organization: Debian JP Project
- References: <811xib64v6.wl@omega.webmasters.gr.jp><81vfenqv1u.wl@omega.webmasters.gr.jp><41408EE1.3050402@redhat.com>
At Thu, 09 Sep 2004 10:12:01 -0700,
Ulrich Drepper wrote:
> GOTO Masanori wrote:
> > Could someone take and look at this patch? I heard and reported from
> > various Japanese users about this problem for a long time.
>
> The implemented behavior has been added by default and changing this
> will break code. Since I did not decide on this myself back when there
> are definitely opposite sides on this issue within the group of people
> affected. In this case it is better to be conservative and not change
> anything.
Current behavior breaks the ISO-2022-JP encoding, so we can't use iconv(3)
to convert to ISO-2022-JP.
Would you explain (or inform me who can explain) why this is a reasonable
behavior, plase?
For example, this is valid ISO-2022-JP sequence:
$ printf "\x1b\x24\x42\x24\x22\x1b\x28\x4a\x5c\x61\x1b\x24\x42\x24\x22\x1b\x28\x42\x5c\x61\x1b\x24\x42\x24\x22\x21\x40\x1b\x28\x42\x61\n"
It will be
[HIRAGANA LETTER A] [YEN SIGN] [LATIN SMALL LETTER A]
[HIRAGANA LETTER A] [REVERSE SOLIDUS] [LATIN SMALL LETTER A]
[HIRAGANA LETTER A] [FULLWIDTH REVERSE SOLIDUS] [LATIN SMALL LETTER A]
In ISO-2022-JP, these 3 characters (YEN SIGN, REVERSE SOLIDUS, FULLWIDTH
REVERSE SOLIDUS) can be represented without any confusion.
But, when it is passed to "iconv -f ISO-20222-JP -t ISO-2022-JP", we get
[HIRAGANA LETTER A] [YEN SIGN] [LATIN SMALL LETTER A]
[HIRAGANA LETTER A] [FULLWIDTH REVERSE SOLIDUS] [LATIN SMALL LETTER A]
[HIRAGANA LETTER A] [FULLWIDTH REVERSE SOLIDUS] [LATIN SMALL LETTER A]
So, it breaks REVERSE SOLIDUS, and convert it to FULLWIDTH REVERSE SOLIDUS.
Why?
> You will have to do much better than just saying "some people
> complained". You have to show that changing this does break anything
> significantly _and_ that people cannot live without this change (despite
> the existing behavior being in place for 7 years now).
I believe we hadn't used iconv(3) for such purpose for long time, but
recently many applications begin to use iconv(3) so this problem is
appeared. (Actually, I tought it was not iconv(3) bug, but application bug
at first time, so others do).
Regards,
Fumitoshi UKAI