This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: EUC-KR and WON SIGN
- To: libc-alpha at sources dot redhat dot com
- Subject: Re: EUC-KR and WON SIGN
- From: Jungshik Shin <jshin at pantheon dot yale dot edu>
- Date: Tue, 7 Nov 2000 15:31:12 -0500 (EST)
- Reply-To: jshin at pantheon dot yale dot edu
Hi,
It was brought into my attention that recent snapshots of glibc has \0x5c
in EUC-KR mapped to U20A9 instead of U005C. Below is what I think on the
issue. Please, accept my apology for not being able to make my message
a part of the thread started by Bruno on the topic. When the thread
was stared, I wasn't on the list and I couldn't find the message IDs of
messages on the thread to make mine refer to it in the message header.
>> Bruno Haible wrote wrote:
> Ulrich Drepper wrote:
>> because that makes more sense than mapping it to backslash (and on Unix
>> we don't use that character as a directory/filename separator). Recall
>> that Jungshik Shin wrote on 2000-09-25 (talking about JOHAB, but I
>> suspect this also holds for EUC-KR):
>> "To represent WON SIGN, people usually use FULL-WIDTH WON SIGN"
> No, this seems not to be true. EUC-KR break,s ISO C, so what? If
> this is what everybody else is using there will be no change. And the
> information I have in the moment (from at last two sources) shows that
> the Won character is normally used.
Could you tell me what two sources in favor of \0x5c in EUC-KR being
half-width WON SIGN you have? In MS-WIndows/MS-DOS, \0x5c is rendered
with glyph in the shape of WON SIGN, but that does NOT necessarily mean
that its SEMANTICS is WON SIGN. Even in MS-Windws/MS-DOS, \0x5c is
NOTHING MORE than a directory separator. I'm sure even in MS-Windows
/MS-DOS, FULL WIDTH WON SIGN(\0xa3dc) is much more frequently used as
WON SIGN than \0x5c. If it comes to Unix/X11 world, in EUC-KR \0x5c
has NEVER had been regarded as WON SIGN. It's always been used as BACK
SLASH both in semantics and in the shape of the glyph. (possibly except
for Solaris 2.x and AIX, there's no X11 fonts for ISO-646-KR/KS X 1003
with HALF-WIDTH WON SIGN at \0x5c. We just use fonts for US-ASCII)
IMHO, what has to be done here are :
A. In EUC-KR module of gconv, the following mappings have to be
used.
EUC-KR Unicode
------ ------
\0x5c <-> U005C
\0xa3dc <-> UFFEC : FULL WIDTH WON SIGN
\0xa3dc <- U20A9 : NOT round-trip nor width-preserving
(i.e. Unidirectional)
B. In the locale defintion for ko_KR.eucKR ( if you don't insist on the
single source for the single locale regardless of the encoding used
), we can have CURRENCY SIGN defined as UFFEC instead of U20A9. On
the other hand, in ko_KR.utf8 we can have CURRENCY SIGN defined as U20A9.
I guess this doesn't necessarily increase the size of glibc distribution
because we can have the locale defintion file for ko_KR.eucKR which
just refers to the locale definition file for generic ko_KR everywhere
but CURRENCY SIGN which has to be overiden to be UFFEC instead of U20A9.
C. If you're concerned about the usage in MS-DOS/MS-Windows,
you can map \0x5c in UHC/Windows-949 to U20A9 (but NOT in EUC-KR module)
UHC Unicode
--- -------
\0x5c <-> U20A9
\0xa3dc <-> UFFEC : FULL WIDTH WON SIGN
D. One could agruably do the same for JOHAB as it's primarily used
in MS-DOS world. Yeah, this is the reversal of my previous
position, but note that as far as EUC-KR is concerened,
I'm standing by what I wrote before (i.e \0x5c should
be left alone and mapped identically to U005C)
I guess this is also in line with what has been done for EUC-JP and SJIS
(CP94?) pair and EUC-CN and CP950(?) pair in glibc.
Best,
Jungshik Shin