This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Improving the IBM437/860/861/862/863/865, MIK encodings
- From: Alexander Shopov <ash at contact dot bg>
- To: libc-alpha at sourceware dot org
- Date: Thu, 11 May 2006 12:02:44 +0300
- Subject: Improving the IBM437/860/861/862/863/865, MIK encodings
Hi guys,
The encodings IBM437/860/861/862/863/865, MIK implemented in GNU libc all contain the following characters:
<U00DF> /xe1 LATIN SMALL LETTER SHARP S (German)
<U00B5> /xe6 MICRO SIGN
<U03C6> /xed GREEK SMALL LETTER PHI
<U03B5> /xee GREEK SMALL LETTER EPSILON
These correspond to Microsoft's versions of the codepages.
However - earlier versions by IBM contain different definitions of these code points:
<U03B2> /xe1 GREEK SMALL LETTER BETA
<U03BC> /xe6 GREEK SMALL LETTER MU
<U2205> /xed EMPTY SET
<U2208> /xee ELEMENT OF
Now - those characters are visually extremely similar:
IBM: ÎÎâ â
MICROSOFT: ÃÂÏÎ
These resources:
1. Markus Kuhn: UTF-8 and Unicode FAQ for Unix/Linux
http://www.cl.cam.ac.uk/~mgk25/unicode.html#conv
2. Unicode Inc.: Listing of the differences between IBM's mappings
http://www.unicode.org/Public/MAPPINGS/VENDORS/IBM/readme.txt
imply that a converter would be better if
both <U00DF> and <U03B2> be mapped to /xe1,
both <U00B5> and <U03BC> be mapped to /xe6,
both <U03C6> and <U2205> be mapped to /xed,
both <U03B5> and <U2208> be mapped to /xee
when converting from encodings containing these characters to one of IBM437/860/861/862/863/865, MIK.
When converting from IBM437/860/861/862/863/865, MIK to something else - we can keep the current state of matters for compatibility.
1. Will GNU libc's maintainers be interested if I try to implement such a change?
2. Can someone guide me through this process as I am unsure where to begin from? I will need a mentor on this.
Kind regards:
al_shopov