This is the mail archive of the
mailing list for the DocBook project.
RE: [docbook] Re: whitespace at the beginning and the end of element content
- From: "Paul Grosso" <pgrosso at arbortext dot com>
- To: "DocBook List" <docbook at lists dot oasis-open dot org>
- Date: Mon, 1 Nov 2004 13:49:51 -0500
- Subject: RE: [docbook] Re: whitespace at the beginning and the end of element content
> -----Original Message-----
> From: Norman Walsh [mailto:email@example.com]
> Sent: Sunday, 31 October, 2004 12:44
> To: Wolfgang Jeltsch
> Cc: DocBook List
> Subject: [docbook] Re: whitespace at the beginning and the
> end of element content
> / Wolfgang Jeltsch <firstname.lastname@example.org> was heard to say:
> | Am Sonntag, 31. Oktober 2004 17:45 schrieb Norman Walsh:
> | Oh, that's bad news. How do I format the source code of a
> longer paragraph
> | then? This way?
> | <para>This is an attempt
> | to format a longer paragraph
> | without getting problems
> | with whitespace.</para>
> Well, more like this:
> <para>This is an attempt
> to format a longer paragraph
> without getting problems
> with whitespace.</para>
> | But this looks ugly, IMO. The way I formatted the
> paragraph in my previous
> | mail (see above) seems much more natural to me.
> More natural, perhaps, but those extra spaces are in your document.
> | And even if I format the paragraph without whitespace after
> the start tag and
> | before the end tag, how can I be sure that linebreaks and
> the spaces used for
> | indenting don't appear in the output?
> I don't know of any processing system that doesn't treat a
> newline like
> a space (outside of verbatim environments, etc.) so they're
> ok. The indents
> are going to be in your content.
> Now, for HTML, it doesn't matter (extra spaces don't matter in HTML)
> and the same may be true for FO, I haven't gone to check.
> | Section 2.10 of "Extensible Markup Language (XML) 1.0
> (Third Edition)" is very
> | vague. It speaks about white space that is used "to set
> apart the markup for
> | greater readability". It says about this kind of
> whitespace: "Such white
> | space is typically not intended for inclusion in the
> delivered version of the
> | document."
> | But who decides which whitespace shall be considered
> whitespace used to set
> | apart the markup? Is whitespace appearing immediately
> after a start tag or
> | immediately before an end tag considered such whitespace or
> not? Does the
> | answer to this question depend on the document type?
> Yes. In "element content" whitespace is insignificant.
I fear this is somewhat misleading.
The XML spec doesn't define "significant", but I would suggest
it's more misleading than not to say that the XML processor
considers some whitespace to be insignificant.
In XML, all whitespace is passed through by the XML processor.
It's just that some (e.g., that in element content) may be marked
specially for the down stream application to handle as it wishes
(e.g., to treat as insignificant). Quoting the spec 
An XML processor MUST always pass all characters in a
document that are not markup through to the application.
A validating XML processor MUST also inform the application
which of these characters constitute white space appearing
in element content.