This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Unicode usage
- From: "Julian Reschke" <julian dot reschke at gmx dot de>
- To: <xsl-list at lists dot mulberrytech dot com>
- Date: Fri, 25 Jan 2002 17:33:19 +0100
- Subject: RE: [xsl] Unicode usage
- Reply-to: xsl-list at lists dot mulberrytech dot com
Well Thomas,
you have proved that programs which do not know about UTF-8 will not display
it properly. Big deal. (Sorry). I don't think that anything except HTML user
agents is relevant here.
Julian
> From: owner-xsl-list@lists.mulberrytech.com
> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of Thomas B.
> Passin
> Sent: Friday, January 25, 2002 5:21 PM
> To: xsl-list@lists.mulberrytech.com
> Subject: Re: [xsl] Unicode usage
>
>
> [Julian Reschke]
>
> > It would depend on the User Agent, not the platform. If this is actually
> > true for any "recent" version of IE (let's say, since 4.0), I'd like to
> see
> > some evidence before I believe it :-)
> >
> >
>
> I just did an experiment that verified what each of us said. I created an
> xml file on my Windows 2000 machine with a ® in it. I transformed it
> with an identity transform twice, first with encoding='utf-8 and
> second with
> encoding='iso-8859-1'.
>
> Looking at the hex bytes, the iso results contained a hex AE
> byte, which is
> correct for character 174. The utf-8 results contained the two hex
> characters C2 AE, which I presume is right for utf-8. Both results
> displayed the registered trademark symbol, the one with the the r in a
> circle.
>
> I copied the results to a floppy and took it over to my Win95/SP2
> computer,
> then displayed the results in IE 5.5. Both files displayed the same,
> showing the right symbol. This is what you said would happen.
>
> I also loaded each result into Notepad on Win95. Notepad
> displayed the iso
> file correctly, but not the utf-8 result (it showed that "A"
> character with
> a little circle above it), ahead of the trademark symbol. This is what I
> was suggesting would happen. BTW, Notepad on the Win2000 computer did
> display both results correctly.
>
> Summarizing, what you will see displayed for high-order characters can
> depend on the encoding, OS, and the viewing program. On older
> versions of
> Windows, at least, non-browsers are likely to display the wrong thing.
>
> In fact, even on my Win2000 machine, using XML Cooktop to run and display
> the transformation gave an incorrect display (and it uses the IE activeX
> control to display the results!), so you can't be sure even on
> Win2000 that
> high order characters will display the intended way, depending on the app.
>
> Try it yourself on your system. Here are the files:
>
> -----------------------------------------------------------------------
> <?xml version='1.0' encoding='utf-8'?>
> <data>Here is a ==®== character</data>
>
> -----------------------------------------------------------------------
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> <xsl:output encoding='utf-8'/><!--Or change to iso-8859-1-->
>
> <!-- Identity transformation template -->
> <xsl:template match='*|@*'>
> <xsl:copy>
> <xsl:apply-templates select="@*|node()"/>
> </xsl:copy>
> </xsl:template>
>
> </xsl:stylesheet>
> -----------------------------------------------------------------------
> Cheers,
>
> Tom P
>
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list