This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Converting poorly formed HTML into well-formed XML
| Does XSLT have the facilities to directly
| read in the poorly formed HTML?
No built-in features to do this.
I'd recommend leveraging Andy Quick's excellent (open source)
Java implementation of Dave Raggett's HTML "Tidy" utility called
JTidy.
http://www3.sympatico.ca/ac.quick/jtidy.html
It can expose a DOM API to the "tidied-up" (that is, well-formed)
XML tree for any ill-formed HTML document. You can then pass
the DOM Document into your XSLT engine for transformation.
In my about-to-be-released book "Building Oracle XML Applications"
from O'Reilly, I had occasion to use this JTidy library to show
readers how to take ill-formed HTML and use XSLT to "scrape"
interesting data out of the "tidied"-up XML result from dynamic
web pages like stock quote services or other online sources of
information.
______________________________________________________________
Steve Muench, Lead XML Evangelist & Consulting Product Manager
BC4J & XSQL Servlet Development Teams, Oracle Rep to XSL WG
Author "Building Oracle XML Applications", O'Reilly
http://www.oreilly.com/catalog/orxmlapp/
| Does XSLT have the facilities to directly read in the poorly formed HTML?
| And if so, what needs to be done.
|
| Or,
|
| Will designing a custom parser that builds a DOM from the poorly formed HTML
| to then be output to an XML file, or directly processed by an XSLT document,
| be the best solution.
|
| I've already begun developing the latter (custom) solution, but thought I'd
| double check to see if there are any HTML -> XHTML converters available.
|
| Thanks in advance for your help,
|
| Joe Fourness
|
|
| XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list