This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: Upper ASCII chars

From: Jay Burgess <jburgess at delanotech dot com>
To: xsl-list at lists dot mulberrytech dot com
Date: Tue, 05 Feb 2002 09:18:02 -0600
Subject: RE: [xsl] Upper ASCII chars
Reply-to: xsl-list at lists dot mulberrytech dot com

[Thanks Jeni, Michael, and David for the replies.  I'll reply to all three 
here.]

 > (Jeni) It depends on what processor you're using.
 >
 > (Michael) Nevertheless, many people do care, so some
 > processors give you a way of controlling it. Saxon has
 > an attribute saxon:character-representation, and I
 > think Xalan has some kind of configuration file.
 >
I am using Xalan.  I'll look for its saxon:character-representation 
equivalent.  That would seem to solve the problem.

 > (Jeni) Out of interest, are you experiencing problems with browsers
 > recognising the character entity references, or is it purely that you
 > don't like the space that they take up, or find them less readable
 > than the native characters?
 >
 > (David) although if you are writing HTML why do you care? the two
 > forms that you show are equivalent to any HTML system.
 >
Unfortunately, it's not an "HTML system". I'm building server-side include 
pages from an XML configuration file.  The parameters for the <SERVLET> 
block (e.g. <param name="input1" value="£©®ÄËÓáöÿ.DTD">) need to be able to 
contain both "lower ASCII" and "upper ASCII" characters.  value="£" is 
completely different data for the SSI parser than value="&pound;".

 > (Michael) Oh dear: "upper ASCII". There's no such thing. ASCII stops
 > at 0x7F. A good first rule in understanding character coding issues
 > is to get your terminology straight!
 >
Yes, ASCII is a 7-bit protocol.  But in the all the years I've been in this 
business, when someone says "upper ASCII", everyone else knows what they're 
talking about. Since my goal was to define my problem, and all three of you 
seemed to understand the issue, I believe it accomplished its purpose.

 > (Jeni) As an alternative, you could change the output method to xml
 > and generate well-formed HTML (or full XHTML if you want).
 >
I did try this already, and this led to a different set of problems (mostly 
formatting related) which I didn't try to address at the time.  If I can't 
get your first suggestion to work, then I'll go back to this option and try 
to make it work.

 > (Jeni) [There's been a recent suggestion on xsl-editors@w3.org that
 > a similar functionality to saxon:character-representation be offered
 > in XSLT 2.0 - you might want to post this example there to demonstrate
 > another use case.]
 >
I'll do that this morning.  Thanks for the suggestion.

Jay


-----Original Message-----
From: Jeni Tennison [mailto:jeni@jenitennison.com]
Sent: Tuesday, February 05, 2002 1:35 AM
To: Jay Burgess
Cc: xsl-list@lists.mulberrytech.com
Subject: Re: [xsl] Upper ASCII chars


Hi Jay,

 > I get the following in the file:
 >
 >    <param name="input1"
 > value="&pound;&copy;&reg;&Auml;&Euml;&Oacute;&aacute;&ouml;&yuml;.DTD">
 >
 > What I want, though, is:
 >
 >    <param name="input1" value="£©®ÄËÓáöÿ.DTD">
 >
 > Is there a way to achieve this?

It depends on what processor you're using. The XSLT 1.0 Rec states
that if the output method is html and the processor knows the
character entity reference for a character, then that character may be
output using the character entity reference, which is what you're
experiencing.

Some processors, notably Saxon (someone tell me if other processors
offer this) give you a bit of control over how you want the characters
to be serialized. With Saxon, you can do:

   <xsl:output method="html"
               saxon:character-representation="native;entity" />

to tell Saxon to serialize non-ASCII characters that can be serialized
as native characters in your character encoding as native characters,
and those that cannot be represented in your character encoding as
entities (if Saxon knows such an entity). This should give you the
result that you're after (assuming that the characters that you're
using are representable within your encoding).

[There's been a recent suggestion on xsl-editors@w3.org that a similar
functionality to saxon:character-representation be offered in XSLT 2.0
- you might want to post this example there to demonstrate another use
case.]

As an alternative, you could change the output method to xml and
generate well-formed HTML (or full XHTML if you want). The characters
won't be represented as entities in that case because XSLT 1.0
processors can't tell the difference between normal XML and
well-formed HTML, so won't escape any of the characters.

Out of interest, are you experiencing problems with browsers
recognising the character entity references, or is it purely that you
don't like the space that they take up, or find them less readable
than the native characters?

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Follow-Ups:
- Re: Upper ASCII chars
  - From: Jonathan Perret

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]