This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]



It was brought into my attention that recent snapshots of glibc has \0x5c
in EUC-KR mapped to U20A9 instead of U005C. Below is what I think on the
issue.  Please, accept my apology for not being able to make  my message
a part of the thread started by Bruno on the topic. When the thread
was stared, I wasn't on the list and I couldn't find the message IDs of
messages on the thread to make mine refer to it in the message header.

>> Bruno Haible wrote wrote:
> Ulrich Drepper wrote:
>> because that makes more sense than mapping it to backslash (and on Unix
>> we don't use that character as a directory/filename separator). Recall
>> that Jungshik Shin wrote on 2000-09-25 (talking about JOHAB, but I
>> suspect this also holds for EUC-KR):
>>      "To represent WON SIGN, people usually use FULL-WIDTH WON SIGN"

> No, this seems not to be true.  EUC-KR break,s ISO C, so what?  If
> this is what everybody else is using there will be no change.  And the
> information I have in the moment (from at last two sources) shows that
> the Won character is normally used.

Could you tell me what two sources in favor of \0x5c in EUC-KR being
half-width WON SIGN you have? In MS-WIndows/MS-DOS, \0x5c is rendered
with glyph in the shape of WON SIGN, but that does NOT necessarily mean
that its SEMANTICS is WON SIGN. Even in MS-Windws/MS-DOS, \0x5c is
NOTHING MORE than a directory separator.  I'm sure even in MS-Windows
/MS-DOS, FULL WIDTH WON SIGN(\0xa3dc) is much more frequently used as
WON SIGN than \0x5c.  If it comes to Unix/X11 world, in EUC-KR \0x5c
has NEVER had been regarded as WON SIGN. It's always been used as BACK
SLASH both in semantics and in the shape of the glyph.  (possibly except
for Solaris 2.x and AIX, there's no X11 fonts  for ISO-646-KR/KS X 1003
with HALF-WIDTH WON SIGN at \0x5c. We just use fonts for US-ASCII)

IMHO, what has to be done here are :

  A. In EUC-KR module of gconv, the following mappings have to be

      EUC-KR      Unicode
      ------      ------
      \0x5c   <-> U005C
      \0xa3dc <-> UFFEC    : FULL WIDTH WON SIGN
      \0xa3dc <-  U20A9    : NOT round-trip  nor width-preserving 
                             (i.e. Unidirectional)

  B. In the locale defintion for ko_KR.eucKR ( if you don't insist on  the
  single source for the single locale regardless of the encoding used
  ), we can have CURRENCY SIGN defined as UFFEC instead of U20A9. On
  the other hand, in ko_KR.utf8 we can have CURRENCY SIGN defined as U20A9.
  I guess this doesn't necessarily increase the size of glibc distribution
  because we can have the locale defintion file for ko_KR.eucKR which
  just refers to the locale definition file for generic ko_KR everywhere
  but CURRENCY SIGN which has to be overiden to be UFFEC instead of U20A9.

  C. If you're concerned about the usage in MS-DOS/MS-Windows, 
     you can map \0x5c in UHC/Windows-949  to U20A9 (but NOT in EUC-KR module)
     UHC        Unicode
     ---        -------
     \0x5c   <-> U20A9
     \0xa3dc <-> UFFEC    : FULL WIDTH WON SIGN

  D. One could  agruably do the same for JOHAB as it's primarily used
     in MS-DOS world. Yeah, this is the reversal of my previous
     position, but note that as far as EUC-KR is concerened,
     I'm standing by what I wrote before (i.e \0x5c should
     be left alone and mapped identically to U005C)

I guess this is also in line with what has been done for EUC-JP and SJIS
(CP94?)  pair and EUC-CN and CP950(?) pair in glibc.


Jungshik Shin

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]