This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: xsl


Hi Tom,

> I am trying to find out how to eliminate duplicate hits in a search
> results list which is contained in XML data:
>
> <Result id="100" name="tom" />
> <Result id="100" name="tom" />
> <Result id="100" name="tom" />
> <Result id="100" name="tom" />

Is it the id or the name that indicates a unique hit?  I'll assume
it's the id.

XSLT doesn't have any great built-in distinct() function (although
there are extension functions like saxon:distinct() that you could
use) so you're *probably* going to be better off addressing the
problem in the search engine rather than using XSLT to do it.

Having said that, you can pick only the unique Result elements by
going through the list of them and only choosing those that don't have
a preceding sibling with the same id:

  Result[not(preceding-sibling::Result/@id = @id)]

If the Result elements are sorted already for you, such that all the
Reseults with the same id are grouped together, then you can use the
more efficient:

  Result[not(preceding-sibling::Result[1]/@id = @id)]

This is more efficient because it only checks the
immediately-preceding Result element rather than going through all the
preceding siblings.

If they're not sorted and you have a lot of Result elements, then you
may want to use the Muenchian method. This involves setting up a key
to index into the Result elements by their id:

<xsl:key name="results-by-id" match="Result" use="@id" />

You can then get all the Result elements with an id of 100, for
example, with:

  key('results-by-id', '100')

And you can get the unique results by testing each Result to see
whether it is the first in the list you get when you use the key to
get Result elements with its ID, either using generate-id():

  Result[generate-id() = generate-id(key('result-by-id', @id)[1])]

or through set logic:

  Result[count(.|key('result-by-id', @id)[1]) = 1]

The Muenchian method is quicker than using preceding-sibling:: but it
takes up more memory because the key creates a hashtable.

I hope that helps,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]