internationalized text processing

Charles Wilson
Mon May 23 14:51:00 GMT 2011

On 5/23/2011 9:44 AM, Nellis, Kenneth wrote:
> OTOH, the cygutils "ascii -e" utility does not 
> recognize that my LANG specifies UTF-8 and outputs garbage for the 
> extended half. Should this be considered a bug?

No, ascii is deliberately intended to be stupid.  In fact, the -e option
itself is simply a workaround.  Originally, ascii always printed the
first 256 codepoints unconditionally.  With the change in 1.7 to
defaulting to UTF-8, we recognized that this was bad, and changed
ascii's default behavior to print only the first 128 codepoints, and
added -e to restore the original behavior.

A bit of history: the reason ascii was originally written was as a
diagnostic tool so that folks could check whether their font settings
and (old, obsolete, do not do this anymore: CYGWIN var contained
charset:oem) var settings were correct, so that the DOS line-drawing
characters could be used in a bash shell (running in a cmd box).

E.g. poor man's hack to get CP437 working

This predated "real" codepage and $LANG handling, in cygwin-1.7.

Now, with "real" $LANG handling, line draw stuff Just Works(tm) when
LANG=*.UTF-8, at least for ncurses programs: try
from the ncurses-demo package, in a bash shell running in a cmd box (or
pstree -G).  So, the need for ascii (-e) as a diagnostic tool is
kinda...not needed anymore.

Thus, I'm not too fussed about this "bug" in an obsolete and no longer
needed diag tool -- but I also don't see a need to remove it from
cygutils. So...mark this "bug" as either NOTABUG or WONTFIX. :-)

With regards to your other questions...I dunno.  Maybe somebody else does.


More information about the Cygwin-talk mailing list