Re: rxvt, ssh and utf8 - partial success

James Garrison wrote:
Baurjan Ismagulov wrote:

On Fri, Jun 04, 2004 at 10:19:01AM -0700, Brian Dessent wrote:

I'd love to know why one or the other terminal setting can't just work
for everything.

This happens due to the following difference in the terminfo entries:

- acsc=``aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
+ acsc=+\257\,\256-\^0\333`\004a\261f\370g\361h\260j\331k\277l\332m\300n\305o~p\304q\304r\304s_t\303u\264v\301w\302x\263y\363z\362{\343|\330}\234~\376,

Handling this in rxvt should solve your problem (I wonder if there are any reasons not to do that).

Not sure what you mean by 'handling'.  Are you saying rxvt needs to be
modified, or just terminfo?

I uploaded the rxvt-cygwin terminfo file from Cygwin onto the Linux
system (into ~/.terminfo/r/rxvt-cygwin).  Setting TERM=rxvt-cygwin now
allows the curses-based program to draw boxes correctly, but
apparently some substitution is going on because it's using plain old
hyphens and vertical bars for lines and plus signs for corners.

Terminfo doesn't seem to know anything about Unicode as far as I can
tell (or does it?).  That leads to the question of how putting the
terminfo file on the Linux system caused the curses-based program to
output single ASCII characters where previously it was sending Unicode
sequences... something understood how to interpret the Unicode box-
drawing characters and replaced them with the nearest ASCII matches
"+", "|" and '-'.

However, this IS NOT happening with Unicode quote characters.  Here's
a snippet from the man page for terminfo itself, as displayed:

       Entries  in  terminfo  consist  of  a  sequence of ‘,’ separated fields
       (embedded commas may be escaped with a backslash or notated  as  \054).
       White  space  after  the ‘,’ separator is ignored.  The first entry for
       each terminal gives the names which are known for the  terminal,  sepa-
       rated  by  ‘|’  characters.   The  first  name given is the most common

Those "‘" sequences turn out to be \xE2\x80\x99, which is the UTF8 encoding of the Unicode character "Right Single Quotation Mark) (U+2019).

Here's the full terminfo entry (decompiled with infocmp):

# Reconstructed via infocmp from file: /home/jhg/.terminfo/r/rxvt-cygwin
rxvt-cygwin|rxvt terminal emulator (X Window System) on cygwin, am, bce, xenl, eo, km, mir, msgr, xon, cols#80, it#8, lines#24, colors#8, pairs#64, acsc=+\257\,\256-\^0\333`\004a\261f\370g\361h\260j\331k\277l\332m\300n\305o~p\304q\304r\304s_t\303u\264v\301w\302x\263y\363z\362{\343|\330}\234~\376, bel=^G, cr=^M, csr=\E[%i%p1%d;%p2%dr, tbc=\E[3g, clear=\E[H\E[2J, el1=\E[1K, el=\E[K, ed=\E[J, hpa=\E[%i%p1%dG, cup=\E[%i%p1%d;%p2%dH, cud1=^J, home=\E[H, civis=\E[?25l, cub1=^H, cnorm=\E[?25h, cuf1=\E[C, cuu1=\E[A, cvvis=\E[?25h, dch1=\E[P, dl1=\E[M, enacs=\E(B\E)0, smacs=^N, blink=\E[5m, bold=\E[1m, smcup=\E7\E[?47h, smir=\E[4h, rev=\E[7m, smso=\E[7m, smul=\E[4m, rmacs=^O, sgr0=\E[m\017, rmcup=\E[2J\E[?47l\E8, rmir=\E[4l, rmso=\E[27m, rmul=\E[24m, flash=\E[?5h\E[?5l, is1=\E[?47l\E=\E[?1l, is2=\E[r\E[m\E[2J\E[H\E[?7h\E[?1;3;4;6l\E[4l, ich1=\E[@, il1=\E[L, ka1=\EOw, ka3=\EOy, kb2=\EOu, kbs=^H, kcbt=\E[Z, kc1=\EOq, kc3=\EOs, kdch1=\E[3~, kcud1=\E[B, kend=\E[8~, kent=\EOM, kel=\E[8\^, kf0=\E[21~, kf1=\E[11~, kf10=\E[21~, kf11=\E[23~, kf12=\E[24~, kf13=\E[25~, kf14=\E[26~, kf15=\E[28~, kf16=\E[29~, kf17=\E[31~, kf18=\E[32~, kf19=\E[33~, kf2=\E[12~, kf20=\E[34~, kf3=\E[13~, kf4=\E[14~, kf5=\E[15~, kf6=\E[17~, kf7=\E[18~, kf8=\E[19~, kf9=\E[20~, kfnd=\E[1~, khome=\E[7~, kich1=\E[2~, kcub1=\E[D, kmous=\E[M, knp=\E[6~, kpp=\E[5~, kcuf1=\E[C, kDC=\E[3$, kslt=\E[4~, kEND=\E[8$, kHOM=\E[7$, kLFT=\E[d, kNXT=\E[6$, kPRV=\E[5$, kRIT=\E[c, kcuu1=\E[A, rmkx=\E>, smkx=\E=, op=\E[39;49m, dch=\E[%p1%dP, dl=\E[%p1%dM, cud=\E[%p1%dB, ich=\E[%p1%d@, il=\E[%p1%dL, cub=\E[%p1%dD, cuf=\E[%p1%dC, cuu=\E[%p1%dA, rs1=\E>\E[1;3;4;5;6l\E[?7h\E[m\E[r\E[2J\E[H, rs2=\E[r\E[m\E[2J\E[H\E[?7h\E[?1;3;4;6l\E[4l\E>, rc=\E8, vpa=\E[%i%p1%dd, sc=\E7, ind=^J, ri=\EM, s0ds=\E(B, s1ds=\E(0, setab=\E[4%p1%dm, setaf=\E[3%p1%dm, hts=\EH, ht=^I,

Any insight ijnto what's going on and how to make it work correctly would be greatly appreciated.

James Garrison                                Athens Group, Inc.                    5608 Parkcrest Dr                    Austin, TX 78731
PGP: RSA=0x92E90A3B DH/DSS=0x498D331C         (512) 345-0600 x150

