This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
Re: Bad Continuation of Multi-Byte UTF-8 Sequence
Michael Westbay wrote:
> While the encoding is part of the specification, it's optional to support
> multiple encodings. Saxon, for example, only supports UTF-8, USASCII, and
> ISO-8859-1 (all of which are exact subsets of UTF-8).
ISO-8859-1 is not subset of UTF-8. If you have stream of bytes which
represents some text in ISO-8859-1 encoding, it is not valid UTF-8
stream. Only us-ascii stream is also UTF-8 stream.
> You must not deal with languages that have multiple encodings. The reason I
> prefer to use Xalan/Xerces over Saxon is this every issue, the Apache XML/XSL
> tools allow the encoding to be specified on a per document basis. The loss
> is speed is made up for in versitility.
You can still use Saxon and use -x a -y parameters to change parser used
to process XML and XSL files. E.g., I am using Crimson parser which
supports all encoding supported by my JVM - it is something about 150
different encodings.
-----------------------------------------------------------------
Jirka Kosek
e-mail: jirka@kosek.cz
http://www.kosek.cz