This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: (text processing) lexical context
- From: Joerg Heinicke <joerg dot heinicke at gmx dot de>
- To: xsl-list at lists dot mulberrytech dot com
- Date: Wed, 24 Apr 2002 09:13:57 +0200
- Subject: Re: [xsl] (text processing) lexical context
- References: <004401c1eb59$657cd0e0$1f2aa58b@KHARTES>
- Reply-to: xsl-list at lists dot mulberrytech dot com
Hello Nicolas,
> <root>
> This is the <w>first</w> <i>sentence</i>. This is the <w>second</w>
> <i>sentence</i>. This is the <w>third</w> <i>sentence</i>.
> </root>
you have really bad structured XML. Where should the processor know
from, where a sentence ends and a new one starts? Can you always use '.'
as marker?
I tried with a key-based solution (all nodes will be collected by the id
of the text-node with the next '.'):
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="sentences" match="node()"
use="generate-id(following-sibling::text()[contains(., '.')][1])"/>
<xsl:template match="root">
<html>
<ol>
<xsl:apply-templates select="text()[contains(., '.')]"
mode="end-of-sentence"/>
</ol>
</html>
</xsl:template>
<xsl:template match="text()" mode="end-of-sentence">
<li>
<xsl:apply-templates select="key('sentences', generate-id(.))"
mode="rest-of-sentence"/>
<xsl:value-of select="substring-before(., '.')"/>
<xsl:text>.</xsl:text>
</li>
</xsl:template>
<xsl:template match="node()" mode="rest-of-sentence">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="text()[contains(., '.')]" mode="rest-of-sentence">
<xsl:value-of select="substring-after(., '.')"/>
</xsl:template>
</xsl:stylesheet>
The output with Xalan:
<html>
<ol>
<li>
This is the <w>first</w>
<i>sentence</i> without a comma.</li>
<li> This is the <w>second</w>
<i>sentence</i>.</li>
<li> This is the <w>third</w>
<i>sentence</i>.</li>
</ol>
</html>
I don't know whether the solution is perfect. It's a bit difficult to
see any errors. But I would start with changing the terrible XML.
Regards,
Joerg
> would be formatted so that the list would look like:
>
> <html>
> <ol>
> <li>first: This is the <b>first</b> <i>sentence</i>.
> <li>Second: This is the <w>second</b> <i>sentence</i>.
> <li>Third: This is the <b>third</b> <i>sentence</i>.
> </ol>
> </html>
>
> But I can't figure out how I can select the text surrounding the <w>
> element without using <xsl:value-of.../>, which does not allow me to
> process the following <i> element...
>
> i.e., I get
>
> <html>
> <ol>
> <li>first: This is the <b>first</b> sentence.
> <li>Second: This is the <w>second</b> sentence.
> <li>Third: This is the <b>third</b> sentence.
> </ol>
> </html>
>
> and the <i> element is lost...
>
> And I can't do <xsl template match="substring(...)"> because substring
> is not a DOM node.
>
> Help: is there a way to process substrings or stg?
>
> N. Mazziotta
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list