This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: how to make the XP parser recognize xml encoding


I had a similar problem when using iso-8859-7 encoding as input and wanting utf-8 encoding as output. What I have done is parse the original XML source and replacing each ASCII instance greater than 127 with &#unicode;. 

I can then feed the XML source into xerces and xalan (using the XSLT... classes) and they get automatically converted to utf-8 on the output. I don't even have to change the xml header to encoding=utf-8 (however if you are using a different set of tools you migh have to). Mind that with these tools, you have no other option than utf-8 for output encoding.

Hope this helps.

> George Prezerakos, Ph.D.
> Mobile Internet Applications Development
> 
> Ericsson Hellas S.A.	Phone: + 301 96 01 441 (ext. 966)
> 33, Zeppou Str., 	                Mobile: + 3 0945 545282
> 166 75 - Glyfada,                     	
> Athens-Greece                     	
> E-mail: george.prezerakos@etg.ericsson.se
> 


-----Original Message-----
From: Tom Wang [mailto:tomw@b-bop.com]
Sent: Tuesday, June 27, 2000 9:04 PM
To: xsl-list@mulberrytech.com
Subject: how to make the XP parser recognize xml encoding


Hi,

This is an interesting problem.  I appreciate if anyone can offer me some
help on the following.  Here's my source xml:

<?xml version="1.0" encoding="iso-8859-1"?>
...

The xml file contains non-ascii characters and it must use the eocoding
specified in the document itself.  I'm using James Clark's XT engine
(com.jclark.xsl.sax.XSLProcessor) and XP parser (com.jclark.xml.sax.Driver).
I construct a FileReader for the above xml file, then use it to construct an
InputSource that feeds into the xsl processor.  But somehow the XP parser is
not recognizing the encoding embedded in the XML decl.  I actually put
garbage there (e.g., encoding="xxx") and the results come out the same.

I traced into James Clark's code and found that because the InputSource is
from a FileReader, it uses encoding "UTF-16" for all character streams.
Help please?

Thanks!

-Tom



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]