This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

pruning nodes not in xpath list


I am trying to use XSLT to solve a "pruning" problem where I do not know the
structure of the source documents.  I want to be able to (programmatically)
generate a stylesheet that will produce a result tree that contains only the
nodes (and ancestor elements) selected by a list of
AbbreviatedAbsoluteLocationPath expressions.  For example: as input to my
program, I might have a list of expressions like the following:

//product/@sku
//product/cost

I would like to be able to generate a sytlesheet that will produce output
documents that exclude all nodes that are not matched by any of the xpath
expressions.  The trick is,for any node that is matched, I want to keep it's
ancestor elements so the document structure does not change (i.e., the same
list of xpaths would find the exact same nodes in the result document).  My
program cannot assume anything about the structure of the source documents,
other than what can be gleaned from the list of xpath expressions.  I have
been reading Michael Kay's book and have tried everything I can think of, but
cannot come up with a workable solution.  In fact, I cannot even get it to
work for a single path that contains a "decendent-or-self::node()" (e.g,
//product/cost).

Example:  Given the above list of xpath expressions and this document as
input:

<?xml version="1.0"?>
<catalog date="2001-02-27">
   <products>
      <department>
         <name>Mens Pants</name>
         <number>55</number>
         <vendor>
            <name>Levi</name>
            <vid>4456</vid>
            <product sku="123-456">
               <cost>39.99</cost>
               <color>black</color>
               <graphic>some url</graphic>
            </product>
            <product sku="987-654">
               <cost>29.95</cost>
               <color>green</color>
               <graphic>another url</graphic>
            </product>
         </vendor>
      </department>
   </products>
   <total_inventory>
      <average_age unit="days">30</average_age>
      <cost>59032.45</cost>
   </total_inventory>
</catalog>

The transformation should produce the following output (again, aside from what
exists in the xpath expressions, I know nothing about the structure of the
input document):

<?xml version="1.0"?>
<catalog>
   <products>
      <department>
         <vendor>
            <product sku="123-456">
               <cost>39.99</cost>
            </product>
            <product sku="987-654">
               <cost>29.95</cost>
            </product>
         </vendor>
      </department>
   </products>
</catalog>

Any ideas?  Is XSLT the wrong solution for this problem?

>From an algorithm point of view, one solution is to create a node-set
collection of all nodes that should be copied to the output document.  For
each xpath, find the terminal nodes and add them and all of their ancestor
elements to the node-set collection.  After the node-set collection is
created, visit each node in source tree in document order -- if the node
exists in the node-set, use <xsl:copy/>.  This would ensure that each node
only gets copied once, and in the document-order so the tree is maintained.  I
cannot figure out a way to implement this algorithm using XSLT.  This could be
done with a DOM (Document), but I don't want to have to implement all of the
code to handle "xpath" expressions.

Thanks!
--
Cliff McBride


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]