getclip and putclip garble unicode characters
Mon Jul 5 10:04:21 GMT 2021
Replying to myself...
Mark Geisert wrote:
> Hi Leonid (?),
> Миронов Леонид Владимирович via Cygwin wrote:
>> getclip and putclip from cygutils-extra garble unicode characters: non-latin
>> characters copied to clipboard in windows are replaced with question marks when
>> retrieved with getclip in cygwin, and non-latin characters copied to clipboard
>> using putclip are pasted it in windows looking like utf-8 displayed in cp1252
>> but can be retrieved with getclip exactly as pasted, so it looks like the
>> problem is not in the way the data is copied but in the way cygwin and windows
>> communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI
>> codepage is set to cp1251 - 1251, not 1252.
> Thanks for the report. I will investigate.
I believe I have a local testcase similar to your report: If I select a region of
text on a message displayed from the Cygwin mailing list digest, and that message
has Cyrillic characters in it, getclip replaces those characters with '?' on output.
Since Thomas suggested an alternative, using 'cat < /dev/clipboard', I tried that
as well and see that here UTF-8 is output and the Cyrillic characters are intact.
So I've modified getclip to understand what MS calls CF_UNICODETEXT from the
clipboard and have it converted to UTF-8 for output. Thus my new getclip can
duplicate what the alternative does. (What getclip could understand previously
was CF_TEXT ("normal" ANSI characters) or CYGWIN_NATIVE (an internal Cygwin format
that makes your putclip + getclip example work)).
How about I generate a test version of the cygutils package with this updated
getclip and you can see if it solves your issue?
More information about the Cygwin