This is the mail archive of the
mailing list for the Cygwin project.
cyg1.7 - DOS character remapping: change request.
- From: Linda Walsh <cygwin at tlinx dot org>
- To: "cygwin at cygwin dot com" <cygwin at cygwin dot com>
- Date: Mon, 23 Nov 2009 15:59:28 -0800
- Subject: cyg1.7 - DOS character remapping: change request.
Was thinking about a 1 or 2 mods for the new characters that are being remapped to the 'private area', but also a compatibility bug.
Maybe I'll get the bug out of the way first.
Filenames created on a samba share are not visible on the server
as anything resembling what I used on Cygwin. I see that as pretty
bad (scale 1-10, maybe a 7).
One possible way to ameliorate that problem is in first suggestion
I thought of.
Instead of using random characters out of the 'random free area' --
which could display as anything if you aren't in cygwin, depending
on what charset you have loaded, why not use 'dedicated' unicode
characters that map to the signs for those characters? They aren't
exactly equivalent, as they include some built-in display spacing,
BUT, they would display a colon as a colon, "*" as a asterisk,
There are reserved and 'fullwidth' forms of each of the characters
that need remapping. The fullwidth forms add some visual space
around the characters though it's still only 1 Unicode character.
The mappings I'd suggest as default mappings are as follows:
dosch U-char Unicode-val ;Unicode Comment
" ï U+FF02 ;FULLWIDTH QUOTATION MARK
* ï U+FF0A ;FULLWIDTH ASTERISK
: ï U+FF1A ;FULLWIDTH COLON
< ï U+FF1C ;FULLWIDTH LESS-THAN SIGN
> ï U+FF1E ;FULLWIDTH GREATER-THAN SIGN
? ï U+FF1F ;FULLWIDTH QUESTION MARK
| ï U+FF5C ;FULLWIDTH VERTICAL LINE
All of the above are the 'FULLWIDTH' versions of each of those
characters. They aren't a perfect substitute, as they don't
have the same ascii values, and, correctly implemented, they
will display a bit of extra white space around the char.
But on the positive side -- and important for unicode parsing,
__each of the fullwidth forms has the same character class as its
So the full width quote has the property 'quotation mark'. So
if the name was read by a unicode capable program (like perl),
it could process the special characters in whatever way it would
have processed the real ascii character.
The other benefit is that a great many unicode charsets have mappings
for the full width forms. Whatever charset I'm using -- they all
displayed as what they are (with slight stylistic variations).
They also displayed in my shell client on linux as their ascii
equivalents and displayed in 3 windows command line clients I
tried (Console2, mintty, standard cygwin).
I'd consider that a strong plus for compatibility -- since outside
of cygwin, use of the private area will often get you question marks
or little square boxes or nothing at all, but this way, visually,
at least, they'll look close to how they are intended to look.
FWIW, one can use the fullwidth forms of / and \ in pathnames where
the OS treats them as normal characters.
That's the most important suggestion I have.
As a /possible/, further, suggestion (that would take more work, and
I don't believe is as important if the above change to use
fullwidth characters is made), would be to allow User assignment
of what Unicode char to substitute in for the special characters.
The chars could be specified by their html id or a pseudo if non
exists, so syntax would look like:
That would allow someone to assign any value they wanted. An
advantage of that is that those who still want to access 'ADS',
could have "col=U+003A" in their CYGWIN var, and they'd still get
I strongly hope and urge you to use the FULLWIDTH equivalents of
the characters you are using. They are shell safe and display
properly (I've been using a few of them like the colon and the
occasional 'slash',) to properly display song titles in my music
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple