This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Improving the IBM437/860/861/862/863/865, MIK encodings

From: Alexander Shopov <ash at contact dot bg>
To: libc-alpha at sourceware dot org
Date: Thu, 11 May 2006 12:02:44 +0300
Subject: Improving the IBM437/860/861/862/863/865, MIK encodings

Hi guys,
The encodings IBM437/860/861/862/863/865, MIK implemented in GNU libc all contain the following characters:

<U00DF>     /xe1         LATIN SMALL LETTER SHARP S (German)
<U00B5>     /xe6         MICRO SIGN
<U03C6>     /xed         GREEK SMALL LETTER PHI
<U03B5>     /xee         GREEK SMALL LETTER EPSILON

These correspond to Microsoft's versions of the codepages.
However - earlier versions by IBM contain different definitions of these code points:

<U03B2>     /xe1         GREEK SMALL LETTER BETA
<U03BC>     /xe6         GREEK SMALL LETTER MU
<U2205>     /xed         EMPTY SET
<U2208>     /xee         ELEMENT OF

Now - those characters are visually extremely similar:
IBM: ÎÎâ â
MICROSOFT: ÃÂÏÎ

These resources:

1. Markus Kuhn: UTF-8 and Unicode FAQ for Unix/Linux
http://www.cl.cam.ac.uk/~mgk25/unicode.html#conv
2. Unicode Inc.: Listing of the differences between IBM's mappings
http://www.unicode.org/Public/MAPPINGS/VENDORS/IBM/readme.txt

imply that a converter would be better if

both <U00DF> and <U03B2> be mapped to /xe1,
both <U00B5> and <U03BC> be mapped to /xe6,
both <U03C6> and <U2205> be mapped to /xed,
both <U03B5> and <U2208> be mapped to /xee

when converting from encodings containing these characters to one of IBM437/860/861/862/863/865, MIK.

When converting from IBM437/860/861/862/863/865, MIK to something else - we can keep the current state of matters for compatibility.

1. Will GNU libc's maintainers be interested if I try to implement such a change?
2. Can someone guide me through this process as I am unsure where to begin from? I will need a mentor on this.

Kind regards:
al_shopov

Follow-Ups:
- Re: Improving the IBM437/860/861/862/863/865, MIK encodings
  - From: Ulrich Drepper

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]