This is the mail archive of the
docbook@lists.oasis-open.org
mailing list for the DocBook project.
Re: Working with XInclude / xml:base / libxml v2.4.24 andabove
- From: Norman Walsh <ndw at nwalsh dot com>
- To: Elliotte Rusty Harold <elharo at metalab dot unc dot edu>
- Cc: veillard at redhat dot com, docbook at lists dot oasis-open dot org
- Date: Wed, 12 Mar 2003 09:29:32 -0500
- Subject: DOCBOOK: Re: Working with XInclude / xml:base / libxml v2.4.24 andabove
- References: <F555D7916F890E40AFE69F41E89861A2C7D9E5@NLDNC004PEX1.ubsgs.ubsgroup.net><20030212140216.O29764@redhat.com> <p04330103ba9141c74b87@[192.168.254.4]><87adg0d9y9.fsf@nwalsh.com> <p04330103ba94ead1c525@[192.168.254.4]>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
/ Elliotte Rusty Harold <elharo at metalab dot unc dot edu> was heard to say:
| At 7:51 AM -0500 3/12/03, Norman Walsh wrote:
|
|>No. The base URI in both cases is: http://www.example.com/docs/
|>
|>The base URI is not the same as the "document URI".
|
| I agree that would be nicer, and it makes more sense; but it's not my
| reading of either the XML Infoset or the XML Base specification. In
| particular, section 4.1 of the XML Base spec
| <http://www.w3.org/TR/xmlbase/#rfc2396> states,
|
| RFC 2396 [IETF RFC 2396] provides for base URI information to be
| embedded within a document. The rules for determining the base URI can
| be summarized as follows (highest priority to lowest):
|
| 1. The base URI is embedded in the document's content.
| 2. The base URI is that of the encapsulating entity (message,
| document, or none).
| 3. The base URI is the URI used to retrieve the entity.
| 4. The base URI is defined by the context of the application.
|
| Assuminng there's no xml:base attribute in scope, then either 2 or 3
| applies. Both use the base URI of the document itself, not the
| directory where the document is found. RFC 2396 seems to say the same
| thing. What am I missing?
This stuff is really confusing. It's especially confusing because of
the rules for constructing an absolute URI from a base URI and a
relative URI.
On further reflection, I think you're right. If you're reading documents from
a filesystem, then the base URI of
file:///path/to/file1.xml
and
file:///path/to/file2.xml
are different and are the URIs of the respective documents. Note
however, that if these base URIs are used construct an absolute URI
from some relative reference inside them, they are each effectively
equivalent to file:///path/to/ (per RFC 2396, Section 5.2, list item 6.a).
I think it follows that in another context, the base URI of the two
documents might be the same. In particular, a multi-part MIME message
might encode those files as:
http://example.com/uri/of/the/mime/package
Content-base: http://example.com/path/to/
...
<<Separator>>
Content-location: file1.xml
...
<<Separator>>
Content-location: file2.xml
in which case the two documents do have precisely the same base-URI.
(If there are persuasive arguments that I'm wrong, I'd love to hear
them because this line of argument lead to a new property in the
XPath2 data model and a new F&O function last Friday.)
In short, returning to your example:
| Suppose for example,
| http://www.example.com/docs/parent.xml includes
| http://www.example.com/docs/child.xml
1. In the absence of other information, it's impossible to know what
the base URIs of these documents are (since the web server could
send a content-base header). However, in the common case, I concede
that they do have different base URIs and those URIs are the full
URIs of each file.
2. But they certainly could have the same base URI (http://www.example.com/docs/).
3. Effectively, the base URI used for resolving relative URIs in all
of the three possible values in play are effectively the same for
that purpose so I'm not sure that libxml is causing any harm by
leaving out what are essentially redundant base URIs.
4. Any statement made about RFC 2396 that's not *immediately* preceded
by 20 minutes of reading in the RFC is probably wrong.
Be seeing you,
norm
- --
Norman Walsh <ndw at nwalsh dot com> | No man is more than another if he
http://www.oasis-open.org/docbook/ | does no more than
Chair, DocBook Technical Committee | another.--Cervantes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>
iD8DBQE+b0RLOyltUcwYWjsRAi3VAJ4yDHslffIXoZqQBgY4dESHcxQTHgCcC9uw
uQR7a4K6tekO446/1pFJyBE=
=n6d5
-----END PGP SIGNATURE-----