This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: 8bit ascii encoding
- From: "Andrew Welch" <awelch at piper-group dot com>
- To: <xsl-list at lists dot mulberrytech dot com>
- Date: Fri, 23 Aug 2002 13:53:32 +0100
- Subject: RE: [xsl] 8bit ascii encoding
- Reply-to: xsl-list at lists dot mulberrytech dot com
ha! no wonder I get confused...
> If each char (in uniocde 2) is in 2 bytes you are using utf-16 not
> utf-8. (Unicode 3 requires more than 2 bytes per character even in
> utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes,
> depending on the character.
If my chars are two bytes each then Im using utf-16, but utf-8 can
consist of 1-5bytes per char... I think I need to read some more.
At the moment, Im using an xml output method with ascii encoding, and
telling IE the encoding is utf-8 (in the meta), therefore any chars not
in ascii should be output as references and displayed correctly in IE as
that is set to UTF-8.
Currently, this results in any chars not in the ascii range being
displayed a single square box, which is progress from before where I was
getting between 3 and 7 chars displayed for any 'special' character...
Anyway, this is getting slightly off-topic and I think Im fighting a
losing battle as anything I do has to go through the ActiveX control,
which I haven't got control of (or any understanding of ;) so I'll call
it a day for now.
Thanks for the continuing education in character encoding - one day I
will get it!
cheers
andrew
> -----Original Message-----
> From: David Carlisle [mailto:davidc@nag.co.uk]
> Sent: 23 August 2002 12:33
> To: xsl-list@lists.mulberrytech.com
> Subject: Re: [xsl] 8bit ascii encoding
>
>
>
> > Yeah... anywhere nice?
>
> I would say that it was suitably far from computers, but it seems that
> even 3000m up a swiss mountain you still expect to find an
> internet cafe
> these days (I resisted the urge to log in and answer any xsl-list
> messages though:-)
>
> > ha.. nice. After some testing it seems that char references display
> > fine, while characters themselves do not
>
> well presumably they would if you wrote the characters in the right
> encoding. Guessing it sounds like you are writing bytes that
> correspond
> to iso-8859-1 characters into a utf8 encoded stream. If so you'll get
> the wrong characters (or more often an error) except for that part of
> utf-8 that happens to use one byte per character.
>
> > I think the reason IE isn't picking up that each char is two
> > bytes (utf-8)
>
> If each char (in uniocde 2) is in 2 bytes you are using utf-16 not
> utf-8. (Unicode 3 requires more than 2 bytes per character even in
> utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes,
> depending on the character.
>
>
> > So I guess I have two options...
> >
> > 1. persevere trying to get IE to treat the output as two byte chars
>
> I think your problem is using the phrase "two byte chars"
> which leads to
> confusion. Characters have a unicode number but do not correspond
> directly to any number of bytes.
> Different encodings map subsets of the unicode character set into
> particular byte combinations.
>
>
> > 2. pass through all char refs to the output un-escaped, and let IE
> > escape them...
>
> All character references are replaced by the referenced
> character by an
> XML parser. So ther eis no way to "pass through" references unchanged.
> The XSLT system can not tell whether a reference or a character was in
> the original data.
>
>
> > Is this the best option?
> It is still not clear what you are trying to do but there should be bo
> real reason why your C part can not handle whatever encoding is coming
> out of the XSLT. It isn't clear from your description whether this is
> utf-8 or utf-16. You may find it easier if you specified
> encoding="iso-8859-1" and used latin-1 in the C part.
>
> David
>
> _____________________________________________________________________
> This message has been checked for all known viruses by Star Internet
> delivered through the MessageLabs Virus Scanning Service. For further
> information visit http://www.star.net.uk/stats.asp or
> alternatively call
> Star Internet for details on the Virus Scanning Service.
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
>
>
>
>
>
> ---
> Incoming mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.381 / Virus Database: 214 - Release Date: 02/08/2002
>
>
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.381 / Virus Database: 214 - Release Date: 02/08/2002
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list