This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Bad Continuation of Multi-Byte UTF-8 Sequence

To: docbook-apps at lists dot oasis-open dot org
Subject: Re: DOCBOOK-APPS: Bad Continuation of Multi-Byte UTF-8 Sequence
From: Michael Westbay <westbay at users dot sourceforge dot net>
Date: Sun, 24 Jun 2001 21:56:12 +0900
References: <Pine.LNX.4.21.0106191258550.21161-100000@miami.datech2.er.heitec.net><87elsb48e6.fsf@nwalsh.com> <5.1.0.14.2.20010624055617.00b2c5f0@127.0.0.1>

To Walsh's comment:

> >Encoding can be specified by this way for external parsed entities,
> >version pseudoattribute is optional - moreover some XML processors are
> >unable to process external entity if it contains version information in
> >its declaration.

Pawson-san wrote:

> Surely this is a weakness in the XML spec then? I'm stuffed if I need
> an external parsed entity in a different encoding?

While the encoding is part of the specification, it's optional to support 
multiple encodings.  Saxon, for example, only supports UTF-8, USASCII, and 
ISO-8859-1 (all of which are exact subsets of UTF-8).

You must not deal with languages that have multiple encodings.  The reason I 
prefer to use Xalan/Xerces over Saxon is this every issue, the Apache XML/XSL 
tools allow the encoding to be specified on a per document basis.  The loss 
is speed is made up for in versitility.

What this function allows me to do is take a document produced by one 
engineer on a Windows box in Shift_JIS, then process it with an XSL(T) on my 
FreeBSD box that is encoded in EUC-JP.  (For HTML, I often have the output 
encoding set in the XSL to be ISO-2022-JP.)

I was recently told (but didn't confirm) that Danish has a number of 
different encodings as well depending on platform.

Where i18n and l10n is concerned, this is a strength in the XML spec, not a 
weekness.

-- 
Michael Westbay
Work: Beacon-IT http://www.beacon-it.co.jp/
Home:           http://www.seaple.icc.ne.jp/~westbay
Commentary:     http://www.japanesebaseball.com/

Follow-Ups:
- Re: Bad Continuation of Multi-Byte UTF-8 Sequence
  - From: Dave Pawson
- Re: Bad Continuation of Multi-Byte UTF-8 Sequence
  - From: Jirka Kosek

References:
- Re: Bad Continuation of Multi-Byte UTF-8 Sequence
  - From: Holger Rauch
- Re: Bad Continuation of Multi-Byte UTF-8 Sequence
  - From: Norman Walsh
- Re: Bad Continuation of Multi-Byte UTF-8 Sequence
  - From: Dave Pawson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]