This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Extracting a list of unique base urls from anchors in a html document.


Hello,

I have a source HTML document that has been converted to XHTML.

This document contains a number of anchor elements (<A>). I want to use XSLT
to extract information about the links contained in the document.

First, I restrict the returned links to links that point to files in the
same folder, like this:

//a[contains(@href,'#') and not(contains(@href, '/'))]

As you see, I'm also restricting the returned links to the ones that have a
hash (#) character in their href attribute.

So far so good. Now I add a second predicate (formatted for readability): 

//a
[contains(@href,'#') and not(contains(@href, '/'))]
[not(substring-before(@href,'#')=substring-before(preceding::a/@href,'#'))]

The second predicate should (I think) limit the returned node-set to contain
only anchors that have a href attribute that has a unique base-url (the part
before the #). However, the expression with the second predicate appended
still returns multiple links that have the same base-url part. Why?

Thanks in advance,
// tt

P.S. In the text above, "link", "anchor" and "<A>" are interchangable 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]