This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Fwd: Re: element nodes in a string?


At 01:22 AM 5/25/01, Chris wrote:
> > However, it is not at all immediately obvious to this
> > newbie why a file that is already well-formed XML
> > cannot undergo such a simple transformation using
> > XSLT.  This seems to be a limitation to XSLT, not an
> > inherently nonsensical thing to do.

Indeed. On the other hand, given the data model that XSLT works on, it's 
arguable whether the transformation of

this is _underlined_ text

to

this is <u>underlined</u> text

is really that simple. Note I said "given the data model". An XSLT 
transformation "describes rules for transforming a source tree into a 
result tree" (XSLT 1). Now, if you have a parser that picks up your input 
string and makes a tree out of it, as in

[text] this is
   [element] u
     [text] underlined
[text] text

(this is of course only a representation of the node tree, not the node 
tree itself), then the transformation is trivial. But an XML parser doesn't 
do that. (There are members of this list that could rig up a little parser 
to do it, but it wouldn't be an XML parser. Such a parser could be wired to 
an XSLT transformation engine. But if you did that, the work of construing 
your input according to the data model would be done, and that's the only 
hard thing about your task -- you then wouldn't even need a transform 
unless you wanted one for some other reason, such as extensibility.)

The fact that your input happens already to be XML is actually moot here. 
It's not that it's well-formed: it's how and where the information you need 
is expressed in it. I could wrap this email into <email>...</email> tags 
and make it (allowing for  escaping a few characters here and there) XML. 
But that doesn't mean I could easily write a filter in XSLT that would pick 
out, say, all the sentences from it, or all the adjectives, and put them in 
alphabetical order. That's actually not much further from XSLT than what 
you're trying to do. Bottom line is, if the information you're trying to 
find isn't in the XML markup, it's hard for XSLT to see it.

> > I know other tools exist to do this; my goal is to
> > learn about XML and XSLT, and this task was simply
> > chosen to focus my study.  The goal is to learn
> > something about XML and its uses/limitations, not to
> > solve this particular text transformation problem.

And so you are learning! lesson number one, pick the right tool for the 
job. Understand the tools and their capabilities and strengths, so you can 
pick the right one. Fall into a trap or two while coming to that 
understanding: that's cool, no blame.

> > It is perfectly ok for me to take away from this the
> > conclusion "XSLT is not suited to this kind of
> > transformation," but I don't see how one could be
> > expected to know that in the beginning.

That's fair. No one warned you "XSLT excels at transforms out of 
well-formed XML, but really bites at transforms of arbitrary data streams". 
Probably too much hype. Oh, and explaining the difference between 
well-formed XML input and arbitrary text input. That's a difference that is 
critical, but obscure to many (especially if they're used to handling 
not-well-formed markup like HTML, which essentially has to be treated as an 
arbitrary text stream, and which XSLT is also no good at). And then -- the 
difficulty that you're having -- that well-formed in itself isn't enough. 
The source *markup* has to identify the features you are leveraging for the 
transform.

Prediction: many ambitious projects in the next few years will founder 
because the input data does not prove to be high enough quality (meaning 
both semantic completeness and correctness -- something a machine cannot 
know!) to drive transformations to get high-quality output.

>   And I don't
> > think I would run into this problem if I were
> > transforming to, say, latex.

Yes you would. It's the nature of your source data that's the problem, not 
your output.

>   It is only because HTML
> > elements are interpreted as XML elements that I have
> > trouble.

Not so. It's because an XML parser doesn't know that you want a "_" or a 
"~" to start an element, and the next one to end it.

> > Can you give me a general statement of the sorts of
> > applications for which XSL *is* well-suited?  It's not
> > a database, but it does have several database-like
> > capabilities.  It's not for text markup, though it can
> > do that, sort of, sometimes...

It is definitely for text markup -- well-formed XML text markup -- but not 
other kinds (unless, as suggested above, you provide your own parser to 
render your non-standard markup into the XSLT data model).

Off-the-shelf XML parsers can only be expected to parse XML input, however: 
that's your problem. It only *seems* like a problem with XSLT.

If you want to try something easy, write a transform that turns

this is <u>underlined</u> text

into

this is _underlined_ text

And don't get discouraged by this immediate problem. XSLT is great -- but 
it really shines when it is combined with other tools that make up for what 
it doesn't do.

Regards,
Wendell




======================================================================
Wendell Piez                            mailto:wapiez@mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]