This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Generating indexes


I have a document in XML with some words marked up with <index> tags. This
document is later going to be transformed into PDF and printed like a book,
with an index. I'm aiming to do this task automatically.

The general idea is to collect the words and phrases marked-up with <index>,
plus the pages on which they appear, to get a list of all matches, in no
particular order, or possibly document order. In a positional flat file, it
may look like this:

12    yoghurt
153   milk
122   yoghurt
132   egg
43    olive oil
32    egg

As soon as I have the page numbers I have total control when producing an
index. I can do scripts that handle cases like 121, 123, 124, 125 (should be
"121, 123-125"). I can handle special characters like á, é, å, ä and ö so
they appear in correct order and so on.

The hard thing is to generate this file of matches.

Of course, XSLT can't know anything about page-numbers, so I guess this is
something that has do be drawn from a rendering engine. Before digging
deeper into this, I wonder if anyone has achieved it, or been successful in
alternative ways.

Just to clarify: I'm not aming at doing a full-blown index. This should be a
one-level index, and the indexing work (placing <index> tags around certain
words in certains elements) is still a work for a human indexer, or to
intelligent scripts. In fact, I made an indexing script, but it's not
intelligent enough to know about mouse and mice... :-)

Gustaf



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]