This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
RE: How to translate HTML to DocBook
- From: "Prikryl,Petr" <PRIKRYLP at skil dot cz>
- To: docbook-apps at lists dot oasis-open dot org
- Date: Tue, 19 Mar 2002 08:59:08 +0100
- Subject: RE: DOCBOOK-APPS: How to translate HTML to DocBook
Dave Brooks wrote...
>At 12:53 19/03/2002 +1100, Andrew Westcombe wrote:
>>At 05:00 PM 12/03/2002 -0600, Patrick Hartling wrote:
>>
>>> It also helps if the source is "good" HTML. Having closing tags such
>>> as </li>, </p>, and </br> helps immensely.
>>
>>I've used DocParse myself, it's not bad, and very good value. As for
>>having "good" HTML, Dreamweaver has a very nice command for stripping out
>>junk, esp. from former MSWord files.
>
>HTML Tidy (see http://www.w3.org/People/Raggett/tidy/) is very good for
>cleaning up HTML.
What I like on the HTML Tidy is that it can replace the <FONT...> and the
like
things by more standard elements with CSS classes. It can also produce the
XML output (i.e. the differences between the original HTML and the wanted
DocBook XML will be even smaller). Then, using a good text editor of your
choice ;-), it is much much easier to get the result.
I tried to XMLize the HTML from MS Word 97 earlier, before I knew HTML Tidy.
It was painful even with Perl in hands. The (free) HTML Tidy can really
save
a lot of work.
HTH, Petr
--
Petr Prikryl, Skil, spol. s r.o., (prikrylp@skil.cz)