This is the mail archive of the mailing list .

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Bad Continuation of Multi-Byte UTF-8 Sequence

Michael Westbay wrote:

> While the encoding is part of the specification, it's optional to support
> multiple encodings.  Saxon, for example, only supports UTF-8, USASCII, and
> ISO-8859-1 (all of which are exact subsets of UTF-8).

ISO-8859-1 is not subset of UTF-8. If you have stream of bytes which
represents some text in ISO-8859-1 encoding, it is not valid UTF-8
stream. Only us-ascii stream is also UTF-8 stream.

> You must not deal with languages that have multiple encodings.  The reason I
> prefer to use Xalan/Xerces over Saxon is this every issue, the Apache XML/XSL
> tools allow the encoding to be specified on a per document basis.  The loss
> is speed is made up for in versitility.

You can still use Saxon and use -x a -y parameters to change parser used
to process XML and XSL files. E.g., I am using Crimson parser which
supports all encoding supported by my JVM - it is something about 150
different encodings.

  Jirka Kosek  	                     

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]