This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
Re: sgml auto-indenter
- To: "docbook-apps at lists dot oasis-open dot org" <docbook-apps at lists dot oasis-open dot org>
- Subject: Re: DOCBOOK-APPS: sgml auto-indenter
- From: "Kevin M. Dunn" <kdunn at hsc dot edu>
- Date: Wed, 29 Nov 2000 10:54:13 -0500
- References: <3A2170D7.1B27E208@hsc.edu>
"Kevin M. Dunn" wrote:
Several people have discussed the use of tidy to
indent sgml and xml sources. It didn't work for my documents, as
tidy did not recognize my entities. Rather than fix tidy, I just wrote
a perl script to indent anything with sgml-type
tags. Only non-empty tags are indented, and text is justified at 80
characters/line (easily changed). Try it out, if you
like, and let me know what needs fixing. I am running perl under redhat
6.1.
Thanks for the feedback. Before I make the changes, this is what I am thinking:
1. Only text with leading whitespace will be processed. All other text
will be passed through unchanged. This way
you can protect anything that needs protecting by ensuring that the
first character on the line is not whitespace. If
you actually need whitespace, use the nbsp entity as the first character
on the line.
2. Only non-empty tags will be indented, and only if at least one of
the closing tags appears first on a line. For example,
<PARA>
Here is some text containing an <EMPHASIS>important</EMPHASIS> word.
But I would like to indent
<ORDEREDLIST>
<LISTITEM><PARA>
Lists</PARA></LISTITEM>
<LISTITEM><PARA>
Paragraphs but not <EMPHASIS>emphasis</EMPHASIS></PARA>
</LISTITEM>
</ORDEREDLIST>
This text will be passed through
unaltered.
</PARA>
Would be processed to:
<PARA>
Here is some text containing an <EMPHASIS>important</EMPHASIS> word. But I
would like to indent
<ORDEREDLIST>
<LISTITEM>
<PARA>
Lists
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
Paragraphs but not <EMPHASIS>emphasis</EMPHASIS>
</PARA>
</LISTITEM>
</ORDEREDLIST>
This text will be passed through
unaltered.
</PARA>
In this example, all <PARA>'s are indented because at least one </PARA>
appeared first on a line. But <EMPHASIS> is not indented because none
of the </EMPHASIS> tags appeared first on a line. If an author
didn't want paragraphs indented, he would make sure that </PARA>
was never first on a line. Let me know if there is a problem with this
approach before I get too far into it.
--
Kevin M. Dunn
kdunn@hsc.edu
Department of Chemistry
Hampden-Sydney College
HSC, VA 23943
(804) 223-6181
(804) 223-6374 (Fax)