This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
Re: Choosing a characterset for DocBook
- From: "Christopher R. Maden" <crism at maden dot org>
- To: docbook-apps at lists dot oasis-open dot org
- Date: Fri, 15 Mar 2002 03:06:40 -0800
- Subject: Re: DOCBOOK-APPS: Choosing a characterset for DocBook
- References: <5.1.0.14.0.20020315020409.038e95c0@mail.maden.org>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
At 02:58 AM 3/15/02, Jens Stavnstrup wrote:
>On Fri, 15 Mar 2002, Christopher R. Maden wrote:
> > 1) Do all of your entities (i.e., files) have encoding declarations? What
> > are they? Remember that UTF-8 is the default unless you explicitly
> specify
> > a different encoding (or use a byte-order mark, in which case UTF-16 is
> the
> > default).
>
>The encoding chosed is as stated above ISO-8859-1, and yes that is
>specified in the XML desclaration statement.
OK - then somehow SAXON isn't honoring that.
> > 2) How are you invoking the parser? From within SAXON, obviously - is
> > SAXON being called from the command line, or within another program? What
> > exactly are the parameters it's being passed?
>
> >From Ant, no specific parameters specified (What are you BTW refering to
>?)
>
>I am still using Saxon 6.4.4, and checking the Change history in 6.5.1, I
>do not see any specific problem with using ISO-8859-1.
SAXON definitely does not have a problem with ISO 8859-1. So somehow it's
being told to expect UTF-8. Exactly what are you using in Ant to call
SAXON? I haven't done a lot of work with Ant - is SAXON being instructed
to read the documents from the filesystem, or are they being passed as a
stream of some sort to SAXON?
>My problem is not so much which encoding, I choose (If there any bugs
>(e.g. characters the parser can't accept), I can fix them). But rather
>trying to avoid my colleagues to ran into these issues.
Once you can get SAXON to correctly read in ISO 8859-1 data, you shouldn't
have any problems; nearly every Windows and UNIX tool in a western European
environment can edit this encoding. The biggest problem you'll run into is
Windows users using the 128-159 range for things like curly quotes and
ellipses; these characters are control characters in ISO 8859-1, and while
not illegal, will not mean what the Windows user thinks they mean.
~Chris
- --
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.5.8
iQA/AwUBPJHVv6xS+CWv7FjaEQKwugCffMf14Ez0TdWE3EuyrGhaZnJGQHUAn3jn
mFt26glbd7bgFtn2+LqSkP7n
=qMy1
-----END PGP SIGNATURE-----