This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: (text processing) lexical context


One other piece of advice (somewhat heretical for this list): XSLT is not
the only tool in your kitbag. In fact, where you want to identify structure
in the source that's not explicit in the markup, XSLT is often not the best
tool for the job.

You could probably tackle this one more easily by writing a SAX filter that
inserts a <sentence> start tag immediately after <root>, a </sentence> end
tag immediately before </root>, and a </sentence><sentence> pair immediately
after a "." that's followed by whitespace.

Michael Kay
Software AG
home: Michael.H.Kay@ntlworld.com
work: Michael.Kay@softwareag.com

> -----Original Message-----
> From: owner-xsl-list@lists.mulberrytech.com
> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of cutlass
> Sent: 24 April 2002 09:04
> To: xsl-list@lists.mulberrytech.com
> Subject: Re: [xsl] (text processing) lexical context
>
>
> Hello Nicolas,
>
> ----- Original Message -----
> From: "Nicolas Mazziotta" <Nicolas.Mazziotta@ulg.ac.be>
>
> > <root>
> > This is the <w>first</w> <i>sentence</i>. This is the <w>second</w>
> > <i>sentence</i>. This is the <w>third</w> <i>sentence</i>.
> > </root>
>
> this particular form of markup keeps cropping up over and
> over again, and i
> suspect that most people will tell you that it is not so
> good. The main
> problem with this type of markup is that it tends to be rather open
> ended....eg. there could be a variety of elements, nesting structures,
> etc....
>
> > <html>
> > <ol>
> > <li>first: This is the <b>first</b> <i>sentence</i>.
> > <li>Second: This is the <w>second</b> <i>sentence</i>.
> > <li>Third: This is the <b>third</b> <i>sentence</i>.
> > </ol>
> > </html>
> >
>
> i am assuming u made an error with the opening <w> in second
> sentance ?
>
> right so you want to
>
> a) tokenize each sentance
> b) number with words ( i.e. First, Second, Third )
> c) copy all children elements within a sentance across
> d) replace elements with other elements
>
> there are a few approaches;
>
> - you are doing too much in one transform, yes it is possible
> to have one
> large complicated transform, but why not break up into small
> steps so u can
> conceptualise
>
> - u can either tokenise each sentance by customising the
> string tokenise
> function ( many places, one of them being www.exslt.org ) and
> tokenise each
> sentance ( based upon finding a period )
>
> - or i suspect this is a rather good use of  Dimitre
> Novatchev's functional
> library at www.topxml.com
>
> both results will require a little investment in learning,
>
> the other stuff, like copying or replacing elements,
> numbering with words
> will come after you get over the first step.
>
> gl, jim fuller
>
>
> > But I can't figure out how I can select the text surrounding the <w>
> > element without using <xsl:value-of.../>, which does not allow me to
> > process the following <i> element...
> >
> > i.e., I get
> >
> > <html>
> > <ol>
> > <li>first: This is the <b>first</b> sentence.
> > <li>Second: This is the <w>second</b> sentence.
> > <li>Third: This is the <b>third</b> sentence.
> > </ol>
> > </html>
> >
> > and the <i> element is lost...
> >
> > And I can't do <xsl template match="substring(...)">
> because substring
> > is not a DOM node.
> >
> > Help: is there a way to process substrings or stg?
> >
> > N. Mazziotta
> >
> >
> >  XSL-List info and archive:
http://www.mulberrytech.com/xsl/xsl-list
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]