internationalized text processing
Mon May 23 14:51:00 GMT 2011
On 5/23/2011 9:44 AM, Nellis, Kenneth wrote:
> OTOH, the cygutils "ascii -e" utility does not
> recognize that my LANG specifies UTF-8 and outputs garbage for the
> extended half. Should this be considered a bug?
No, ascii is deliberately intended to be stupid. In fact, the -e option
itself is simply a workaround. Originally, ascii always printed the
first 256 codepoints unconditionally. With the change in 1.7 to
defaulting to UTF-8, we recognized that this was bad, and changed
ascii's default behavior to print only the first 128 codepoints, and
added -e to restore the original behavior.
A bit of history: the reason ascii was originally written was as a
diagnostic tool so that folks could check whether their font settings
and (old, obsolete, do not do this anymore: CYGWIN var contained
charset:oem) var settings were correct, so that the DOS line-drawing
characters could be used in a bash shell (running in a cmd box).
E.g. poor man's hack to get CP437 working
This predated "real" codepage and $LANG handling, in cygwin-1.7.
Now, with "real" $LANG handling, line draw stuff Just Works(tm) when
LANG=*.UTF-8, at least for ncurses programs: try
from the ncurses-demo package, in a bash shell running in a cmd box (or
pstree -G). So, the need for ascii (-e) as a diagnostic tool is
kinda...not needed anymore.
Thus, I'm not too fussed about this "bug" in an obsolete and no longer
needed diag tool -- but I also don't see a need to remove it from
cygutils. So...mark this "bug" as either NOTABUG or WONTFIX. :-)
With regards to your other questions...I dunno. Maybe somebody else does.
More information about the Cygwin-talk