This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: How to translate HTML to DocBook


Dave Brooks wrote...
>At 12:53 19/03/2002 +1100, Andrew Westcombe wrote:
>>At 05:00 PM 12/03/2002 -0600, Patrick Hartling wrote:
>>
>>>  It also helps if the source is "good" HTML.  Having closing tags such 
>>> as </li>, </p>, and </br> helps immensely.
>>
>>I've used DocParse myself, it's not bad, and very good value. As for 
>>having "good" HTML, Dreamweaver has a very nice command for stripping out 
>>junk, esp. from former MSWord files.
>
>HTML Tidy (see http://www.w3.org/People/Raggett/tidy/) is very good for 
>cleaning up HTML.

What I like on the HTML Tidy is that it can replace the <FONT...> and the
like 
things by more standard elements with CSS classes. It can also produce the
XML output (i.e. the differences between the original HTML and the wanted 
DocBook XML will be even smaller).  Then, using a good text editor of your
choice ;-),  it is much much easier to get the result.

I tried to XMLize the HTML from MS Word 97 earlier, before I knew HTML Tidy.
It was painful even with Perl in hands.  The (free) HTML Tidy can really
save
a lot of work.

HTH, Petr
-- 
Petr Prikryl, Skil, spol. s r.o., (prikrylp@skil.cz)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]