[glibc x gettext] Bad po files ?
Bruno Haible
haible@ilog.fr
Thu Apr 19 11:24:00 GMT 2001
Andreas Jaeger writes:
> "Rodrigo Barbosa (aka morcego)" <rodrigob@conectiva.com.br> writes:
> > Looks like zh_TW.po is also buggy. The error follow, both from msgfmt and
> > iconv:
> >
> > $ msgfmt zh_TW.po -o zh_TW.gmo
> > zh_TW.po:563: invalid control sequence
> > zh_TW.po:564: end-of-line within string
This is a different issue: It's a PO file which has extraneous
backslashes (in particular, after the chinese word "allowed").
That particular ideograph has a second byte == 0x5C in the big5
encoding. Such extraneous backslashes were needed with earlier
(non-multibyte aware) versions of msgfmt.
The fix is to change 0x5C 0x5C into 0x5C in those lines where msgfmt
complains.
C programs using Big5 encoding in string literals will have to be
changed similarly when gcc will support multibyte encodings in source
files, as required for ISO C 99 compliance.
> > $ iconv --from-code=big5 --to-code=iso-8859-1 zh_TW.po -o output
> > iconv: illegal input sequence at position 654
It makes no sense to attempt to convert chinese text full of
ideographs to Western ISO-8859-1 encoding.
> --- po/zh_TW.po 2000/08/28 07:56:32 1.1
> +++ po/zh_TW.po 2001/04/19 17:35:31
> @@ -2,6 +2,11 @@
> # Copyright (C) 2000 Free Software Foundation, Inc.
> # Tung-Han Hsieh <thhsieh@linux.org.tw>, 2000
> # Yuan-Chung Cheng <platin@ch.ntu.edu.tw>, 2000
> +# This file is currently not installed since it contains illegal
> +# multibyte characters. Just run either of these:
> +# $ msgfmt zh_TW.po -o zh_TW.gmo
> +# $ iconv --from-code=big5 --to-code=iso-8859-1 zh_TW.po -o output
> +# to see the errors.
> #
> msgid ""
> msgstr ""
The multibyte characters are valid; only some backslashes are
spurious.
Bruno
More information about the Libc-alpha
mailing list