This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: (text processing) lexical context


Hello Nicolas,

> <root>
> This is the <w>first</w> <i>sentence</i>. This is the <w>second</w>
> <i>sentence</i>. This is the <w>third</w> <i>sentence</i>.
> </root>

you have really bad structured XML. Where should the processor know 
from, where a sentence ends and a new one starts? Can you always use '.' 
as marker?

I tried with a key-based solution (all nodes will be collected by the id 
of the text-node with the next '.'):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:key name="sentences" match="node()" 
use="generate-id(following-sibling::text()[contains(., '.')][1])"/>

<xsl:template match="root">
     <html>
         <ol>
             <xsl:apply-templates select="text()[contains(., '.')]" 
mode="end-of-sentence"/>
         </ol>
     </html>
</xsl:template>

<xsl:template match="text()" mode="end-of-sentence">
     <li>
         <xsl:apply-templates select="key('sentences', generate-id(.))" 
mode="rest-of-sentence"/>
         <xsl:value-of select="substring-before(., '.')"/>
         <xsl:text>.</xsl:text>
     </li>
</xsl:template>

<xsl:template match="node()" mode="rest-of-sentence">
     <xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="text()[contains(., '.')]" mode="rest-of-sentence">
     <xsl:value-of select="substring-after(., '.')"/>
</xsl:template>

</xsl:stylesheet>

The output with Xalan:

<html>
<ol>
<li>
This is the <w>first</w>
<i>sentence</i> without a comma.</li>
<li> This is the <w>second</w>

<i>sentence</i>.</li>
<li> This is the <w>third</w>
<i>sentence</i>.</li>
</ol>
</html>

I don't know whether the solution is perfect. It's a bit difficult to 
see any errors. But I would start with changing the terrible XML.

Regards,

Joerg



> would be formatted so that the list would look like:
> 
> <html>
> <ol>
> <li>first: This is the <b>first</b> <i>sentence</i>. 
> <li>Second: This is the <w>second</b> <i>sentence</i>. 
> <li>Third: This is the <b>third</b> <i>sentence</i>.
> </ol>
> </html>
> 
> But I can't figure out how I can select the text surrounding the <w>
> element without using <xsl:value-of.../>, which does not allow me to
> process the following <i> element...
> 
> i.e., I get
> 
> <html>
> <ol>
> <li>first: This is the <b>first</b> sentence. 
> <li>Second: This is the <w>second</b> sentence. 
> <li>Third: This is the <b>third</b> sentence.
> </ol>
> </html>
> 
> and the <i> element is lost...
> 
> And I can't do <xsl template match="substring(...)"> because substring
> is not a DOM node.
> 
> Help: is there a way to process substrings or stg?
> 
> N. Mazziotta


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]