This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [docbook-apps] Dynamic web serving of large Docbook


Frans,

You mention PDF output in your message... If a commercial solution
is an option, you might want to take a look at RenderX EnMasse:

  http://www.renderx.net/Content/tools/enmasse.html

A brief description:

  EnMasse is a formatting server. It accepts documents locally or
  over the network and formats them with high throughput,
  distributing actual formatting tasks among multiple computers.
  Customers are those who print or deliver electronically
  customized documents in high volumes and various formats.

And there are more details here:

  http://www.renderx.net/Content/support/enmasse/guide.html#d0e10

I actually just started looking at it today, but from what I've
seen and read of it so far, it seems to me like one possible
solution to the kind of performance issue you're seeing (for PDF
output at least).

Outside of the way it can distribute the actual formatting load
among multiple computers, it -- even if you run it as just one
instance on one machine -- it seems like it could provide
significant reduction in load on your Web server.

At its core, what it does is, instead of re-launching Java and XEP
each time a certain XML source file needs to be rendered, it runs
as a multi-threaded server in the background, waiting to receive
files from clients and then transforming them and returning the
results -- and, when needed, processing requests from multiple
clients at the same time.

As far as that part goes, I would guess that Cocoon might be doing
something similar. And one downside of EnMasse compared to Cocoon
is that (as far I can tell), it currently only handles PDF/Postscript
rendering. But it seems like they could pretty easily take the same
code and extend it to handle generating HTML output or anything else.

But one big upside of EnMasse is that it's using XEP as its PDF
rendering engine. I guess there might be a way to have Cocoon use
XEP instead of FOP. But if there isn't -- if Cocoon locks you into
FOP -- then Cocoon would seem to me to be worthless for generating
production-quality PDF output. Because there are some very basic
parts of the FO spec that aren't implemented in FOP (such as
keep-with-next and percentage values for column widths in tables).

  --Mike

Frans Englich <frans.englich@telia.com> writes:

> 
> Hello all,
> 
> I'm scouting for a solution for a large Docbook/website project, and perhaps 
> someone have had similar problems.
> 
> Here's the situation:
> 
> The sources stretches over several books and over 500 pages, and multiple 
> authors are working and updating on a daily basis. It's maintained in a CVS 
> repository, and the document's primary usage is on a website which 
> occasionally should handle traffic corresponding to a slashdotting without 
> requiring manual intervention(switch to serving true static files, for 
> example). The output would be with the navigation structure(chunked), and PDF 
> files for each chunk.
> 
> One solution is to do an ordinary transformation, run by a cron/makefile 
> script. But this is inflexible, since other content needs dynamic generation, 
> and it also -- actually -- becomes a performance issue since it involves many 
> files(largely because it's chunked PDF too); especially since the script would 
> have to be run with short intervals in order to avoid long waits between 
> commit/result.
> 
> The perfect solution, AFAICT, would be a dynamic, cached, generation. When a 
> certain section is requested, only that part is transformed, and cached for 
> future deliveries. It sounds nice, and sounds like it would be fast.
> 
> I looked at Cocoon(cocoon.apache.org) for helping me with this, and it does 
> many things well; it caches XSLT sheets, the source files, and even 
> CIncludes(same as XIncludes basically).
> 
> However, AFAICT, Docbook makes it not easy:
> 
> * If one section is to be transformed, the sheets must parse /all/ sources, in 
> order to resolve references and so forth. There's no way to workaround this, 
> right? 
> 
> * Cocoon specific: It cannot cache "a part" of a transformation, which means 
> the point above isn't workarounded. Right? This would otherwise mean the 
> transformation of all non-changed sources would be cached.
> 
> This is further encumbered by that chunked output doesn't send to standard 
> out, but write to files. Probably my knowledge about Docbook & Cocoon is too 
> low, but perhaps it can be workarounded, or that it is no problem for Cocoon.
> 
> I tried playing with the rootid parameter, but it didn't matter as Bob's 
> Docbook XSL: The Complete Guide said. A document that takes 11 seconds with 
> xsltproc(fast..) on a idle modern computer, took 7 seconds when only a small 
> section was outputted. Not enough.
> 
> I've asked on the cocoon-users list, but there was no enlightening replies.
> 
> Feel free to clear my confusion. What it basically boils down to, is:
> 
> Is there any way to generate parts from a Docbook Set in a quick way, suitable 
> for dynamic web serving?
> 
> 
> Cheers,
> 
> 		Frans

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]