This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Splitting an XML file based on size
- To: xsl-list at lists dot mulberrytech dot com
- Subject: Re: [xsl] Splitting an XML file based on size
- From: dan mason <dmason at wso dot williams dot edu>
- Date: Wed, 4 Apr 2001 10:30:24 -0400
- Reply-To: xsl-list at lists dot mulberrytech dot com
> Date: Tue, 3 Apr 2001 15:50:04 -0700
> From: Adam Van Den Hoven <Adam.Hoven@bluezone.net>
> Subject: [xsl] Splitting an XML file based on size
>
> Hey guys,
>
> I'm processing an NITF file into HTML. NITF is very much like HTML in
> that
> it has a body with paragraph tags that has mixed content. The HTML that
> I am
> creating from my tranforms can quickly become several tens of kb in
> size.
> Since I'm transfering this over a wireless modem to a PocketPC at a
> maximum
> of 14.4 kbs, an HTML file that is 15kb is entirely too big.
>
> I need some way to keep track of the number of characters I've
> processed and
> stop when I reach a specific size, stoping at the end of the
> paragraph. I
> understand that counting characters is not very precise but I am only
> interested in getting the transfer size to be less than 2K or so.
>
I used to work on the development of a mobile applications platform
(NetMorf SiteMorfer) that had to deal with byte size pagination (that's
what we called this problem) in a flexible, automagic way for n
applications and n devices, all of which had different digest sizes
(some mandatory, others suggested, like for the Pocket PC, Palm, RIM,
etc.), numbers of rows, numbers of accesskeys, etc.. The short answer
is that it's not easy in general, and especially not in XSLT. Before I
get flamed, let me try to explain why :) and invite people to produce a
pure XSLT solution, because I know it's possible, but I also know that
it's a royal pain in the behind (at least, the way I was trying to do
it).
Solution 1 would be the pure XSLT solution. Like I said, I think it's
possible, your code snippet down below is a start. But I think it's
going to be extremely hard to make a solution like that extensible (you
may end up writing the same code for <p>, <table> and any other tags,
just slightly different). Also, I'll go out on a limb here and make a
blanket statement: XSLT (this version, anyway) is not supposed to be
the end point of a delivery architecture. XSLT is designed for document
transformation, so going from unpaginated NITF to unpaginated HTML is
almost trivial, as you know. But it has no clue what device it's
talking to, which delivery architectures have to know and take into
account. You could make your stylesheet aware of the device and its
capabilities, although the colossal pain of keeping variables for byte
size, number of rows, number of accesskeys (for phones), and linking to
the data you didn't have room for will keep you up nights.
You could probably use extension functions or calls out to Java classes
to give you more power and a cleaner stylesheet, but it's still a pain
(and I have no idea what the performance implications are). I don't
know much about that stuff; it's possible that a few extension
functions would be able to keep track of where you are and short circuit
the transformation when you overflow, but I don't remember whether they
can be stateful? if not, Java calls would work, I ended up writing a
Java class to catch and paginate tags as I wrote them, with varying
levels of success.
Solution 2 would be to use XSLT and build a pagination engine that takes
in the output and chops it down to size. This makes a lot more sense to
me, all you have to do is make sure you're spitting out XHTML, parse it,
and go through and count bytes. You still have to decide what to do
with the data you chop off, and you have to make sure you never chop off
a valid end tag, things like that, but it's doable. I worked on a
prototype of a system like this, but for n devices; instead of spitting
out XHTML, we used our own XML to preserve structure, and then embedded
markup inside it (WML, HDML, HTML, whatever). So, based on universal
rules for how to paginate our XML (in your case, NITF), we could chop
markup for any device down to size using one component. It was spiffy.
If you can pull off solution 2, it has a bunch of advantages: 1) you can
reuse your pagination engine for multiple apps, and not have to write it
all into each stylesheet (I know you can simplify this by inheriting
XSLT templates, but I dare anyone to do it :), 2) the stylesheet author
(if it's not you) doesn't have to know how to paginate anything, they
can just write XSLT and not worry about it, and 3) your stylesheets are
cleaner, and don't take as long to execute (probably, there are
performance implications for splitting the job like this too, as we have
to reparse the XHTML, etc.). I did all this in C++, a coworker did the
same thing in Java, don't know how easy it would be to do in a scripting
environment.
Good luck, I hope this is useful, and more than that, I would love to
hear about experiences other people have had with paginating in XSLT. I
know that at least for mobile apps, this was concern #1, and everybody
had a story on how to do it. Not being an XSLT guru, I didn't know the
answer, but I figure somebody on this list might...
-d
>> I can't be so coarse as counting paragraphs since I might also have a
>> table (essentially an HTML table) or lists or something. Some
>> paragraphs
>> will be as short as a single sentance, others will be much longer.
>>
>> I also need to do some additional processing after I reach the end of
>> the
>> NITF text (but the size of those will be much more rigid and simply
>> subtracted from the target filesize).
>>
>> I had thought about doing something approximately like:
>>
>> <xsl:template match="p" mode="block">
>> <xsl:param name="cursize" select="0">
>> <xsl:variable name="size" select="$cursize" />
>> <p>
>> <xsl:apply-templates select="child::node()" mode="inline">
>> <xsl:with-param name="cursize" select="$size + 7" />
>> <!-- +7 characters for the tags -->
>> </xsl:apply-templates>
>> </p>
>> <xsl:if test="$size <= 400">
>> <xsl:apply-templates match="followingsibling::p[1]"
>> mode="block"/>
> <xsl:with-param name="cursize" select="$size"
> </xsl:apply-templates>
>> </xsl:if>
>> </xsl:template>
>>
>> but clearly that isn't going to work. I also assume that making a
>> global
>> variable called $size wouldn't work either.
>>
>> I am getting the feeling that this isn't strictly possible with XSL. I
>> am
>> using MSXML 3 so scripting might be a solution but I am loath to use it
>> unless I have to.
>>
>> Adam van den Hoven
>> Internet Application Developer
>> Blue Zone
>> tel. 604.685.4310
>> fax. 604.685.4391
>> Blue Zone makes you interactive.(tm) http://www.bluezone.net/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list