This is the mail archive of the docbook@lists.oasis-open.org mailing list for the DocBook project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: objection to docbook.dcl


<!-- Resent because the first try didn't get through -->

Adam Di Carlo wrote at 21 Mar 2001 -0500:
 > Shipped with the DocBook DTDs from 2.4.1 and up is 'docbook.dcl', an
 > SGML declaration for use with DocBook documents.  However, this
 > declartion is unnecessarily restrictive, to the level where it is
 > rather cumbersome to implement.

This is a wonderful piece of mail!  Somewhere, in some archive, is a
piece of mail dated several years ago from Eve Maler explaining the
changes that she'd made from the previous DocBook SGML Declaration to
arrive at the current Declaration.  The changes would have ranged from
prudent for the time to some, like NAMELEN 45, that would have pushed
the limit of what you could do with SGML systems of the time.

Since then, of course, XML has standardised the removal of many of the
petty restrictions that we had to put up with (be thankful that you
don't need to worry about SGML's Capacities), Unicode has become
commonplace, and SP has become the default SGML parser by being both
better and cheaper than its now moribund competition (although I still
wish that SP had implemented CONCUR).

This mail, then, reflects the current reality whereas the SGML
Declaration that it rails against pushed the envelope of the SGML
systems of its time.

 > My argument is that the DocBook declaration should diverge from the SP
 > (and OpenSP) implied declarations only where the divergance expresses
 > a real necessity to diverge.  This is based on the principle that

The DocBook SGML Declaration predates current SP implementations.  I'm
a bit hazy about whether sgmls or nsgmls was current when the first
version of the current DocBook SGML Declaration was released, but
DocBook itself is definitely older than nsgmls.  I am certain,
however, that the DocBook SGML Declaration predates Unicode and
multi-byte support being built into the standard SP distribution.

 > software (including SGML parsers) should be tolerant of what they
 > accept.  The unnecessarily broad divergance of the shipped Docbook
 > declaration puts a burden on document engineers using DocBook.

ISO 8879 allows an SGML Declaration to be provided as part of the
document's prolog but, in the absence of a provided SGML Declaration,
an SGML parser can infer its own.  Parsers have to support the
Reference Concrete Syntax (RCS), but a concrete syntax only covers
about half the things that you can declare in an SGML Declaration, and
you really don't want to restrict yourself to the RCS since, among
other things, it has NAMELEN 8.  (Parsers need to support the RCS
since an SGML Declaration conforms to the RCS, including the NAMELEN
restriction.)

A concrete syntax doesn't cover, for example, the CHARSET description
or the OMITTAG parameter.  There is a complete SGML Declaration
provided in ISO 8879 that you could consider as THE standard SGML
Declaration, but many people have complained about that using much the
same terms as you use to complain about the DocBook SGML Declaration.

When I taught a tutorial on the SGML Declaration, one of the first
exercises was parsing a document sans SGML Declaration using multiple
SGML parsers to see what surprises you got because different parsers
infer different SGML Declarations.

SP actually has a very permissive inferred SGML Declaration, which of
course is why you expect that every SGML Declaration should be as
permissive.

 > I am considering here only the DocBook SGML DTD, since I presume the
 > Declaration is rather irrelevant for XML files, since all XML files
 > have the same XML declaration applied to them.
 > 
 > I consider here 'docbook.dcl' as shipped with DocBook 4.1.
 > 
 > Major problems:
 > 
 >  OMITTAG is turned off (why?)

The conventional wisdom is or was that different SGML parsers were
likely to infer different combinations of tags is you left off too
many.  In fact, in the bad old days, I had one project where I had to
use a specific parser (not sgmls or nsgmls) because that parser would
infer the tags that I wanted and sgmls/nsgmls would just complain.

 >  NAMELEN is too short

It was permissive for the time.

 >  Document Character set is too restrictive

Allowing more than ASCII was permissive for the time, and Unicode
wasn't even on many people's radar at the time.  Even getting the
right CHARSET identifier was a black art for a time since different
parsers recognised different CHARSET identifiers.

 >  SUBDOC is turned off (why?)

Because not every SGML parser supported it and because the
conventional wisdom was that parsing another DTD for each SUBDOC was
an enormous overhead.

 > 
 > Description:
 > 
 > * OMITTAG is turned off
 > 
 > 'OMITTAG' is turned off in 'docbook.dcl', disallowing markup
 > minimization of any sort.  This is on in the implied declaration of
 > both Jade and OpenJade. This creates problems because documents using
 > the default declaration for their parser will have a valid document,
 > but if the user decides to be more fasidious and user the docbook SGML
 > declaration, sudden their document will not be valid.
 > 
 > The major problem is that trying to turn this on will make a large
 > number of existing SGML DocBook instances invalid.

There's always "spam" from the SP distribution for normalising SGML
documents.

 > * NAMELEN is too short
 > 
 > The NAMELEN quantity set in docbook.dcl is set to 45, rather than the
 > default SP NAMELEN of 99999999.
 > 
 > A number of users have complained of problems due to this limitation
 > (do a google search on 'docbook namelen' to see what I mean) in any
 > cases (such as the SUSE Linux distribution) where the declaration is
 > enforced.
 > 
 > Quoting <URL:http://xml.coverpages.org/wlw14.html>:
 > 
 >    Care should be used when changing these since creating a variant
 >    syntax may make it difficult for some SGML systems to process
 >    documents created with that syntax.  The best means of guaranteeing
 >    portability between different SGML systems and applications is to
 >    use the reference concrete syntax as much as possible.
 > 
 > One wonders why we need to diverge from the reference concrete syntax
 > here.

Be careful what you wish for.

 > 
 > * Document Character set it too restrictive
 > 
 > As an example, to workaround limitations in the support of KOI-R SDATA
 > entities in Jade and OpenJade, KOI-R users have to use unicode
 > entities.  With the docbook.dcl file, these entities are disallowed,
 > although they are perfectly valid with the implied SP declaration.
 > Example of being disallowed:
 > 
 >   jade:/usr/share/sgml/entities/sgml-iso-entities-8879.1986/ISOcyr1.ent:1:16:E: \
 >   "1072" is not a character number in the document character set

There's another workaround for KOI-R in my now-dated paper at [1].

Using Unicode wasn't an option even for SP at the time that the
current DocBook SGML Declaration was created.  Now, of course, it is
more of an option.

 > * SUBDOC is turned off
 > 
 > Why is it necessary to disallow SUBDOC in DocBook SGML documents?
 > Seems like some authors may wish to use this, even if its not fully
 > supported by existing stylesheets.

The problem at the time was stylesheets, since there wasn't a standard 
stylesheet (perhaps except for an ArborText stylesheet), but
non-support among SGML parsers.

Regards,


Tony Graham
------------------------------------------------------------------------
Tony Graham                           mailto:tony.graham@ireland.sun.com
Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3            x(70)19708


[1] http://www.mulberrytech.com/papers/docchar.htm

------------------------------------------------------------------
To unsubscribe from this elist send a message with the single word
"unsubscribe" in the body to: docbook-request@lists.oasis-open.org


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]