This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

sgml auto-indenter


Several people have discussed the use of tidy to indent sgml and xml sources. It didn't work for my documents, as
tidy did not recognize my entities. Rather than fix tidy, I just wrote a perl script to indent anything with sgml-type
tags. Only non-empty tags are indented, and text is justified at 80 characters/line (easily changed). Try it out, if you
like, and let me know what needs fixing. I am running perl under redhat 6.1.

Known problems: will break line-specific enviroments. So far, the script is quite general--it does not recognize
specific tags and so could be used for any xml or sgml, not just docbook. Is there any way to recognize literal text
independent of DTD? Leading whitespace, for example? Trailing whitespace? Or I could indent tags only, and leave
all non-tag text unjustified and unindented.

----Cut Here------

#!/usr/bin/perl -w
#
# sb: the sgml beautifier
# indents non-empty sgml tags
# usage: sb filename or sb < filename or | sb
# author: Kevin M. Dunn (kdunn@hsc.edu)
# license: anyone is free to use this for any purpose whatever
#
$jl = 80; #text will be justified to 80 characters/line
$nl = 0;
$sp = 0;
$newline = ""; # hack to prevent extraneous blank first line
$space[0] = "";
separate_tags();
get_tags();
indent_tags();
unlink ("$$.tmp"); # remove temporary file
print "\n"; # add final newline to output
sub separate_tags {
  open(FILETMP, ">$$.tmp");
  while (<>){
    $_ =~ s/</\n</g;
    $_ =~ s/>/>\n/g;
    print FILETMP "$_";
  }
  close(FILETMP);
}
sub get_tags {
  open(FILETMP, "$$.tmp");
  while (<FILETMP>){
    $word = $_;
    $word =~ s/[> ].*//;
    chomp($word);
    if ( $word =~ /^<\/.*/ ){;
      $tag2{$word} = 1;
      $word =~ s/\///;
      $tag1{$word} = 1;
    }
  }
}
sub indent_tags {
  open(FILETMP, "$$.tmp");
  while (<FILETMP>){
    chomp($_);    $word = $_;
    $word =~ s/[> ].*//;
    if ( $tag1{$word} ){
      print "\n$space[$sp]$_";
      $nl = $jl; # force new line on next line of input
      $sp++;
      if ( ! $space[$sp] ){
        $space[$sp] = $space[$sp-1] . "  ";
      }
    }
    elsif ( $tag2{$word} ){
      $sp--;
      print "\n$space[$sp]$_";
      $nl = $jl; # force new line on next line of input
    }
    elsif ( $word =~ /<.*/ ) {
      print "$newline$space[$sp]$_";
      $newline = "\n"; # hack to prevent extraneous blank first line
      $nl = $jl; # force new line on next line of input
    }
    elsif ( length($_) > 0 ) {
      justify();
    }
  }
}
sub justify {
  @words = split;
  $nw = @words;
  for ($i = 0; $i < $nw; $i++ ){
    $ll += length($words[$i]) + 1 + $nl; # line length if this word is added
    if ($ll < $jl){ # if short enough, print it
      print "$words[$i] ";
      $nl = 0;
    }
    else { # if line is too long, start a new one
      print "\n$space[$sp]$words[$i] ";
      $nl = 0;
      $ll = length($space[$sp] . $words[$i]) + 1;
    }
  }
}
----Cut Here------
-- 
Kevin M. Dunn
kdunn@hsc.edu
Department of Chemistry
Hampden-Sydney College
HSC, VA 23943
(804) 223-6181
(804) 223-6374 (Fax)
 
Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]