This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: cyg1.7 - DOS character remapping: change request.

# Eric Blake:... [believes he has round trip mapping and that it is more valuable] than user's being able to identify their files in the OS GUI or on a linux server]

# Linda W. replies to Eric:
  [points out that the current system already uses valid Unicode values
  (as others have pointed out, not all UTF-16 values are valid Unicode
  values) and that the current system is already breaking round trip
  mapping, so this is not a valid point with the current encoding]

# Corinna Vinschen writes:
Right, and we use them to map characters from the base plane.  There's
no area in the entire Unicode plane which would not conflict one way or
the other.

But there are "probabilities of conflict". Cygwin wants to allow
the use of the 7-deadly chars, by mapping them 'randomly' (to some,
"hopfully" unused area. I say lets map them to their visual
equivalents. That way, they have a strong chance of being recognized
as correct in Explorer and in linux Gui's whether they are on SMB
mappings or have been copied across.

As it is, those characters, in Explorer or on a linux server will
look like random blanks, boxes or other garbage, and the files won't
be identifiable. I see that as bad. The display equivalents
would look like the ASCII equivalents enought to allow recognition
of what the filename is meant to be. Best of all, that displayed
value would be a constant based on the reserved UNICODE value
of those characters. They could always (with a character set that
displays those values), display their ASCII equivalents.

We're using the same mapping as Interix does, so we're at
least compatible with one other product.  The only alternative is
not to map ascii chars at all and revert this change.
  Interix is a MS product.  MS is not noted for following standard,
but doing their darndest to do harm to standards.  They chose their
values before Unicode was standardized.

Any other standards group I know of is going UTF-8. All of the linux distributions I know are going UTF-8. I'd like to see Cygwin
go that way too. Using the visual encodings for the deadly 7 will
allow the chars to look correct on Windows (in Explorer and Unicode browsers like IE and FF) as well as on Linux.

Oh and, btw., the conversion between base plane and private use area is
only done for system objects like filenames.  It's not done for every
multibyte to widechar conversion within the application itself.  So,
*if* you have collisions, they will only occur for filenames, which are
rather unlikely (not impossible, I know) to use these private use
  Am aware of this -- it's looking at the files in Explorer or on
linux that I want to see something that looks like a colon when I
put in a colon.  :-).

  If you were strongly concerned about mapping collisions,
     you could:

1) use a single env var to turn it off or on, OR 2) use the html entities to provide valid mappings, OR
3) do either of the above in the registry

  But barring any other changes, I'd really, (like pretty please!)
like to see them mapped to their, reserved-visual, but semantically
impotent equivalents.  After all, that's one reason those characters
are there! :-)


Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]