This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Generating indexes
- To: "'xsl-list at lists dot mulberrytech dot com'" <xsl-list at lists dot mulberrytech dot com>
- Subject: RE: [xsl] Generating indexes
- From: Stuart Brown <Stuart dot Brown at helicon dot co dot uk>
- Date: Tue, 25 Sep 2001 11:32:13 +0100
- Reply-To: xsl-list at lists dot mulberrytech dot com
If you were to have your file divided into numbered units (1, 1.1, 1.2,
1.2.1, 1.2.2, etc.) you could grab the number attribute of the ancestor
parent of each individual instance of <index>. The reference to a numbered
chunk of text is not uncommon, and easy enough to navigate (as long as your
text chunks are of a medium size), and would allow you to repurpose the same
data for eBooks, etc., where pagination varies according to user settings.
-----Original Message-----
From: Gustaf Liljegren [mailto:gustaf.liljegren@xml.se]
Sent: 25 September 2001 11:09
To: XSL List
Subject: [xsl] Generating indexes
I have a document in XML with some words marked up with <index> tags. This
document is later going to be transformed into PDF and printed like a book,
with an index. I'm aiming to do this task automatically.
The general idea is to collect the words and phrases marked-up with <index>,
plus the pages on which they appear, to get a list of all matches, in no
particular order, or possibly document order. In a positional flat file, it
may look like this:
12 yoghurt
153 milk
122 yoghurt
132 egg
43 olive oil
32 egg
As soon as I have the page numbers I have total control when producing an
index. I can do scripts that handle cases like 121, 123, 124, 125 (should be
"121, 123-125"). I can handle special characters like á, é, å, ä and ö so
they appear in correct order and so on.
The hard thing is to generate this file of matches.
Of course, XSLT can't know anything about page-numbers, so I guess this is
something that has do be drawn from a rendering engine. Before digging
deeper into this, I wonder if anyone has achieved it, or been successful in
alternative ways.
Just to clarify: I'm not aming at doing a full-blown index. This should be a
one-level index, and the indexing work (placing <index> tags around certain
words in certains elements) is still a work for a human indexer, or to
intelligent scripts. In fact, I made an indexing script, but it's not
intelligent enough to know about mouse and mice... :-)
Gustaf
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list