This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: & in SGML vs XML
- To: xsl-list at mulberrytech dot com
- Subject: Re: & in SGML vs XML
- From: "Christopher R. Maden" <crism at lexica dot net>
- Date: Sun, 05 Nov 2000 23:47:35 -0800
- Reply-To: xsl-list at mulberrytech dot com
At 12:48 4-11-2000 +0100, Matthias Häußer wrote:
>I have another tricky &-related question:
It's not XSL-related, and is better suited for XML-L (mail
listserv@listserv.heanet.ie, no subject, body "subscribe xml-l") or
comp.text.xml.
>I have SGML documents which can easily be converted to XML by just
>exchanging the declaration in the first line(s),
>except for that they contain &'s standing alone, as in
><line>you & me</line>.
>
>This is legal in SGML, but XML parsers and XT do not accept it.
>Is there a way of getting this right except for string replacement
>(& -> &)? (Which is tricky because "real" entities like Č
>must not be destroyed.)
>James Clark's sx does it alright, but I'd prefer a Java solution
>(ideally, one line of declaration either in the stylesheets or the XML).
>
>In other words: Is there a way of treating an XML document like
><line>you & me</line>?
An ampersand is recognized as a "delimiter in context", meaning that it
must be followed by a name start character (see product [59] of ISO
8879). Assuming your SGML used the reference concrete syntax, you could do
something like
s/&\([^a-zA-Z]\)/\&\1/g # ampersand followed by innocuous character
# is replaced by & and character
s/&$/\&/ # ampersand at end of line is replaced by
# &
See <URL:http://www.oreilly.com/%7Ecrism/sgmldefs.html> for the SGML formal
productions, but they aren't very useful without the text of the Standard.
-Chris
--
Christopher R. Maden, Senior XML Analyst, Lexica LLC
222 Kearny St., Ste. 202, San Francisco, CA 94108-4510
+1.415.901.3631 tel./+1.415.477.3619 fax
<URL:http://www.lexica.net/> <URL:http://www.oreilly.com/%7Ecrism/>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list