This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: html to xml



> So the conclusion
> is, I guess, "clean up the HTML minimally even before running tidy".
> I was afraid someone would say that. My problem is that the task is to
> convert our existing web pages (6196 documents, at last count) to (TEI DTD

I wasn't sure quite what your context was.
Surely grabbing floating PCDATA and sticking it in a paragraph element
is something easily done in the post tidy XSL transformation to TEI.

Grabbing html section heads into TEI/docbook style section containers is
always a pain but you can do it in XSL with the usual "grouping"
techniques. It's made a bit easier if you know that the H? elements all
appear in "correct" sequence, not jumping from h1 to h3. If you use
ISO-HTML DTD then the SGML parser (eg sx ) will add any missing section
levels automagically if you set the appropriate parameter entity.

David




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]