This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: sorting a list of titles after removal of stopwords and special characters
Trevor Nash wrote:
> What you need is an expression that, given the context of a title
> element, will return a string containing the edited title (stop words
> removed). This cannot be done with standard XSLT, but you have three
> possibilities:
Actually, it's not *impossible* with standard XSLT, although
admittedly it isn't pretty. Assuming that $punctuation is a string
holding the ignorable punctuation characters and that the list of
stopwords were sorted such that 'an' comes before 'a' rather than
after it, you could use:
concat(
substring(
substring(translate(title, $punctuation, ''),
string-length(
$stoplist[starts-with(
translate(current()/title,
concat($lowercase, $punctuation),
$uppercase),
translate(., $lowercase, $uppercase))]) + 2),
1 div boolean($stoplist[starts-with(
translate(current()/title,
concat($lowercase, $punctuation),
$uppercase),
translate(., $lowercase, $uppercase))])),
substring(
translate(title, $punctuation, ''),
1 div not($stoplist[starts-with(
translate(current()/title,
concat($lowercase, $punctuation),
$uppercase),
translate(., $lowercase, $uppercase))])))
If we were using XPath 2.0, assuming an if statement similar to
that in XQuery, it would look something like:
if ($stoplist[starts-with(
translate(current()/title,
concat($lowercase, $punctuation),
$uppercase),
translate(., $lowercase, $uppercase))])
then substring(translate(title, $punctuation, ''),
string-length(
$stoplist[starts-with(
translate(current()/title,
concat($lowercase, $punctuation),
$uppercase),
translate(., $lowercase, $uppercase))]) + 2)
else translate(title, $punctuation)
which isn't that much more pleasant.
If the stop words were stored with a space, as:
<ignore>the </ignore>
<ignore>an </ignore>
<ignore>a </ignore>
(which would probably a good idea anyway, given that quite a few
titles might begin with the letter 'A') then you could use simply:
substring(translate(title, $punctuation, ''),
string-length(
$stoplist[starts-with(
translate(current()/title,
concat($lowercase, $punctuation),
$uppercase),
translate(., $lowercase, $uppercase))]) + 1)
> 1) You are using Saxon, which has an extension saxon:function
> which lets you write a function in XSLT - more or less the
> contents of your mode="with-stoplist" template.
Just to mention, you can also use func:function from the EXSLT
namespace http://exslt.org/functions in Saxon, 4XSLT, jd.xslt and
libxslt to achieve this. It's more portable to use func:function than
to use saxon:function (because it's available in those other
processors), but they do basically the same thing. See
http://www.exslt.org/func for details.
Cheers,
Jeni
---
Jeni Tennison
http://www.jenitennison.com/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list