This is the mail archive of the
mailing list for the DocBook project.
doc domain vs. problem domain semantics (Re: listitem)
- From: "Matt G." <matt_g_ at hotmail dot com>
- To: docbook at lists dot oasis-open dot org
- Date: Sat, 29 Dec 2001 06:45:56 +0000
- Subject: DOCBOOK: doc domain vs. problem domain semantics (Re: listitem)
Again, sorry I let so much time go by, before getting together my response.
I hope you can find a moment to consider some of my concerns.
>Subject: docbook-digest Digest #77
>Date: Mon, 03 Dec 2001 08:33:39 -0500 (EST)
(The following is the header to the the proper message. In my previous
message, I quoted the same header, though it was actually a reply to a
different message -- apologies.)
>From: Norman Walsh <email@example.com>
>Subject: DOCBOOK: Re: listitem
>Date: Mon, 03 Dec 2001 06:34:32 -0500
>/ "Matt G." <firstname.lastname@example.org> was heard to say:
>| >You might try using nested variablelists.
>| >Inside each of your main listitems, use a variable
>| >list for this structured information.
>| Well, I have a couple of problems with this approach. First of
>| all, it gets even further from the proper semantics to describe
>| what I'm actually trying to do (though I could deal with that,
>| though hopefully it'd only be a stop-gap measure).
>What are the semantics of your data?
If I used nested variable lists, the top-level one would be fairly
appropriate (which is what I'm doing), since each item is a field in a data
structure, but the nested one would have an entry for each *property* of a
given field, which is pretty far off from the implied semantics of
As a matter of fact, I'd guess that more often than not, variablelist is
used to list things other than variables. This gets the subject of my
message, and the tangent the thread is getting off to, which is that since
there aren't semantics rich enough to describe the types of formatting
structures people use in documents, the more domain-specific ones are fallen
back upon, as a crutch. This has the effect of ruining the semantics of the
domain-specific markup, particularly if it's uses are mixed, within a single
>| More importantly (in the
>| short-term) it doesn't even appear to be nested, at all, in the
>| DSSSL print style-sheets (version 1.74b - the latest).
>Using what backend?
OpenJade 1.3. Is there any other DSSSL implementation as complete and
>| First of all, I feel one needs to be able to nest structural
>| elements in <listitem>. I'd certainly like to hear other points
>| of view, on the matter, but I just think it's imperative to be
>| able to partition a <listitem> into a finer-grained structure.
>There's a great long list of structural elements than you can put
>inside a list item. Section isn't one of them because it would
>make a complete mockery of the document hierarchy.
I don't really care whether the same constructs are available for use within
a <listitem> as elsewhere; my point is that I think there might be valid
reasons to subdivide <listitem>s further, into titled chunks. Do you agree?
>| Secondly, I'm surprised there's no sort of an element with a
>| title on the same line (see my <namedproperty> block element
>| example, in my previous message).
>"Title on the same line" is a presentational, not a semantic or
True -- what I'm really concerned about is the structure of the construct,
which it can be difficult to get into the habit of conceptually separating
from the presentation. However, I think that unless you have semantics for
99% of the problem domain, you need semantics specific to the document
domain, on which to fall back. If those aren't available, then people will
resort to abusing what problem domain constructs you give them that have the
presentational or structural properties similar to what's missing.
>| sort of output, and there you go! HOWEVER, if DocBook is ever
>| to scale to meet the basic needs of a substantial portion of the
>| various technical and scientific documentation sub-domains, it
>| must provide
>"DocBook is an XML/SGML vocabulary particularly well suited to
>books and papers about computer hardware and software (though it
>is by no means limited to these applications)."
So, is there really no desire to augment it to be better suited for more
general documentation tasks and more easily adaptable to other sorts of
problem domains than HW/SW?
IMO, the DocBook DTD (which, admittedly, I haven't really spent much time
dissecting) should be partitioned into document construct and HW/SW
constructs (in addition to the various other classes of attribute and entity
definitions). Stylesheets, too. This would make it easier for say a
biotech publication or physics department of a major university to use the
core documentation semantics as a foundation for their own field-specific
documentation vocabulary, without carrying extra baggage or suffering with
unnecessary name collisions with semantics foreign to their domain.
Another important development would need to be replacements (which could
co-exist, in conventional DocBook) for things that get abused as fall-backs,
like <variablelist>. Having complete document-domain semantics would allow
users to transform their own specialized vocabularies into this DocBook
subset, as an intermediate stage, and avoid the complexity of going straight
to XSL-FO (which is also less useful than a richer, more structured
vocabulary, like DocBook).
>The target domain of DocBook is computer software and hardware
>documentation. It happens to be suitable for a very wide range of
>other sorts of documentation, but the technical committee has
>historically been reluctant to add new markup specifically for
>features outside the scope of its present domain (DocBook is quite
>large enough :-).
Do you see that what I'm interested in is two things:
1) Preserving the semantics of HW/SW-specific constructs, by
providing suitable fall-backs
2) Allowing DocBook to be more easily adapted to other domains,
either through augmentation or as a richly structured
This has the advantage of allowing other fields to better take advantage of
the effort and refinement that has gone into DocBook and the tools that have
been developed for it. Also, the more people who use DocBook, the better
off those of us are who have expertise in working with it or who have
developed tools for it, as our skills and tools become more marketable.
With regard to the latter point, bare in mind that while there's been lots
of money in the computer HW and SW fields, recently, that may not always be
the case, as comoditization continues and the supply of skilled labor
>| In fact, my opinion is that there should be a layer *between*
>| problem-domain specific semantics and XSL-FO, which would be
>| comprised exclusively of constructs relating to document
>| structure. Then, an
>Given my experience with the way authors write, it's generally
>impractical to separate document structure entirely from
You'll always have high-level structural elements, like <section> and
<book>, but I'd argue that you might even be able to do away with things
like generic sorts of lists, if your semantics are sufficiently rich and
well adapted to the problem domain.
Whether it's worth trying to capture the semantics of the text, so
thoroughly, is highly case-specific, which is why I think flexibility to
easily adopt either approach, and even transition from augmenting to
layering, is of great importance.
>| pursue. If they are fairly unambitious, they can seek to augment
>| the structural vocabulary with some of their own extensions. If
>| they want to promote or enforce more rigid semantics that deal
>| exclusively with their problem-domain, and/or if they want close
>| control over the structure and content of their output documents,
>| they can add a layer on top of the document structure semantics
>| (using XSLT, to do the translation into XML DocBook, for
>| example). Furthermore, there would be a fairly smooth
>| transition path from the former to the latter.
>Uh huh. Been there, done that. Do it often, in fact. Although I
>think I tend to do it from the "other end" so to speak. Usually,
>I have some very specific set of data that doesn't fit nicely
>into my documentation markup, but I want to use it in my
>documentation. So I write an XSLT stylesheet to convert it into
To me, it seems the real question is one of whether there are any other
applications for the data than presenting it in a human-readable form. If
not, and if the structure of a DocBook document isn't a terribly
inconvenient authoring format, then I say just write the document. However,
there are many cases where a document isn't the most desirable repository
format, for the data, due to other processing requirements, authoring
efficiency, or manageability concerns. (IMO, the latter tends to be under
appreciated--most of my design documentation is embedded within my source
code, in structured comment blocks, for example.)
>want. Then with makefiles, I can edit the data and rebuild the
>documentation entirely painlessly.
So, you don't have a tool to generate your dependencies automatically, do
you? I'll soon whip one up, in Python. I probably won't bother to convert
it to C/C++, unless I can get it in the Xerces distribution, though.
Ideally, I think you also want a command-line XPath query tool, for
dependencies that don't use external entities or the external DTD subset.
>| As it happens, I'm in the process of doing the former, however
>| I'm generating external parsed XML DocBook entities. So, it's
>| a little like the latter, in that it's partially layered on top
>| of DocBook. I'd hoped to take advantage of DocBook stylesheets
>| to do most of the formatting work, for me. I also wanted to be
>| able to use the core DocBook semantics in portions of the
>| documents that are written by hand (these parts contain the
>| actual references to the external entities, giving the author
>| the ability to change their order and use, within the document,
>| as well as provide additional context).
>Sounds like a fine plan to me.
It suits my needs quite well. My point above is that there are probably
plenty of cases for which this model isn't a good fit.
FYI, I use this approach for both an Interface Definition Language, from
which I generate portions of my API document (and both sides of the
interface), as well as the embedded design documentation I mentioned,
previously. I'm using XSLT all around, and wish XalanC would support XSLT
1.1. I also dearly wish it had a command-line flag for specifying an SYSTEM
id search path (for external entities and DTD subsets), similar to the '-I'
option supported by most C/C++ compilers!!
Join the world’s largest e-mail service with MSN Hotmail.