This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: Un-escape and re-transform


Hi

Unfortunately, the solution offered (below) produces an empty result
document (i.e. the page is blank). I'm using msie 5.0 with msxml 3.0.
The templates are read and processed, but no output is given.

If I add "debugging" text into the xsl, as follows:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="1.0">
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="text()">
    (A <xsl:value-of disable-output-escaping="yes" select="."/> A)
  </xsl:template>
  <xsl:template priority="-1"
                match="@* | * | text() | processing-instruction() |
comment()">
    (B <xsl:copy>
      (C <xsl:apply-templates
           select="@* | * | text() | processing-instruction() | comment()"/>
C)
    </xsl:copy> B)
  </xsl:template>
</xsl:stylesheet>

I get the following result:
(A A) (B (C C) B)

So, something is happening....
What's going wrong here?

Regards,
Bas Alberts



-----Original Message-----
From: Robert C. Lyons [mailto:boblyons@unidex.com]
Sent: Tuesday, April 10, 2001 16:47
To: xsl-list@lists.mulberrytech.com
Cc: bas.alberts@group2000.nl
Subject: RE: [xsl] Un-escape and re-transform
Importance: High


Bas writes:
> My Content Provider delivers XML files with partially escaped HTML tags,
for
> example:
> <content>
>         <web>
>                 &lt;P>This is text.&lt;/P>
>                 &lt;P>This is more text.&lt;/P>
>         </web>
> </content>
>
> My quest is to replace the "&lt;" by the un-escaped "<" character, and
then
> redo the XSLT for that <P>...</P> bit.

Bas,

I would beg the Content Provider to place well-formed
HTML (or XHTML) in the XML documents (rather than HTML,
in which the markup is escaped).

A few weeks ago, we had the exact same problem.
We were lucky, since the sender of the XML data was
willing to embed well-formed HTML in the XML document.

I hope that you are as lucky.
If not, then perhaps you could use the following XSLT
stylesheet to unescape the markup:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="1.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="text()">
    <xsl:value-of disable-output-escaping="yes" select="."/>
  </xsl:template>

  <xsl:template priority="-1"
                match="@* | * | text() | processing-instruction() |
comment()">
    <!-- Identity transformation. -->
    <xsl:copy>
      <xsl:apply-templates
           select="@* | * | text() | processing-instruction() | comment()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

The problem with this approach is that it
will unescape markup characters that are
not really markup. For example:

<content>
  <web>
    &lt;P>C'est dommage. :-&lt; &lt;/P>
  </web>
</content>

If there's any chance that the escaped
HTML will contain markup characters that are
not really markup, then I think you'll need
to write a more sophisticated unescape
algorithm.

Hope this helps.

Bob

<sig name    = 'Bob Lyons'
     title   = 'B2B Integration Consultant'
     company = 'Unidex, Inc.'
     phone   = '+1-732-975-9877'
     email   = 'boblyons@unidex.com'
     url     = 'http://www.unidex.com/'
     product = 'XML Convert: transforms flat files to XML and vice versa' />

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]