This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Re: RE: RE: DOM and XML parser
- From: "ashu t" <aashut at rediffmail dot com>
- To: xsl-list at lists dot mulberrytech dot com
- Date: 19 Aug 2002 08:19:46 -0000
- Subject: Re: Re: RE: RE: [xsl] DOM and XML parser
- Reply-to: xsl-list at lists dot mulberrytech dot com
Thanks a lot
It is really invaluable and indepth information about XSLT
Processor and XML Parser.
Thanks
ashu
Both. The parser reads the raw file(s) that comprise the XML
document,
decoding bytes into characters, condensing character references,
(e.g., the 4
characters "!" become the 1 character "!"), normalizing
whitespace in
attribute values, using the DTD to fill in default attribute
values and
resolve entities, and checking for well-formedness. The parser
passes along
the 'important' information about the XML document to the
application (the
XSLT processor).
The information it passes is pretty much exactly what the
processor needs in
order to model the XPath/XSLT node tree. For example, the parser
says things
like "there is an element named 'stylesheet' in namespace
'http://www.w3.org/1999/XSL/Transform', its lexical name is
'xsl:stylesheet',
it has an attribute named 'version' with value '1.0', it contains
an element
named 'template'..." and so on. SAX and DOM parsers do this in
very different
ways, but the idea is the same.
The parser does not report lexical differences. For example,
<foo a1="one" a2="two">1 & 2 are < 3</foo>
and a mess like
<foo
a1 = "one
" a2 = "two"
><![CDATA[1 & 2 are < 3]]></foo>
mean exactly the same thing and are reported the same; the XSLT
processor will
never know the original looked one way or the other. It just
knows that the
following logical information items exist and have this
relationship to each
other:
element type 'foo' in no namespace
| \__attribute name 'a1', value character data 'one'
| \__attribute name 'a2', value character data 'two'
|
|__character data '1 & 2 are < 3'
The processor is required to treat this information as if it were
structured
according to the XPath/XSLT node tree model, like this:
element node named 'foo' in no namespace
| \__namespace node binding prefix 'xml' to name
'http://www.w3.org/XML/1998/namespace'
| \__attribute node named 'a1', value character data 'one'
| \__attribute node named 'a2', value character data 'two'
|
|__text node encapsulating '1 & 2 are < 3'
A DOM parser uses a similar kind of tree of nodes that is
implicit through
the interfaces it provides. However, this tree is not entirely
compatible with
an XPath/XSLT tree, and it requires more memory than it should,
so AFAIK most
XSLT processors, if they take a DOM document as input, walk the
DOM tree and
build their own XPath/XSLT tree from it, so they can discard the
DOM. This is
slow, too, so most XSLT processors prefer to use a SAX parser
when possible.
A SAX parser is event-based and just zips through the document
once, reporting
what it finds along the way, by calling methods that the
application has
implemented to handle the reported events.
- Mike
____________________________________________________________________________
mike j. brown | xml/xslt:
http://skew.org/xml/
denver/boulder, colorado, usa | resume:
http://skew.org/~mike/resume/
XSL-List info and archive:
http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list