This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Regular expression functions (Was: Re: comments on December F&O draft)
David,
>> Most regular expression languages don't find overlapping matches,
>> do they? It seems to add a lot of extra complexity if they do.
>
> No, but then they don't return a list of all matches either.
Some do, if it's a global match. From some JScript documentation:
"If the global flag (g) is not set, Element zero of the array
contains the entire match, while elements 1 – n contain any
submatches that have occurred within the match.... If the global
flag is set, elements 0 - n contain all matches that occurred."
> In Xpath you can't do that. So a replace function that only lets you
> replace one set of unstructured input by some more unstructured
> output is not particularly useful.
I agree with your analysis about regexp replace in general, though
it's not altogether useless - when global, at least it goes some way
towards helping with the classic multi-string-replacement problem. For
example, to escape newline characters with "\n", tabs with "\t" and
carriage returns with "\r":
replace(replace(replace($string, '
', '\n'),
'	', '\t'),
'
', '\r')
(or more manageably with a simple mapping operator:
$string -> replace(., '
', '\n')
-> replace(., '	', '\t')
-> replace(., '
', '\r')
Sorry, couldn't resist.)
But as you've illustrated this doesn't help with the other classic in
this genre, which is replacing 
 characters with <br /> elements.
> If however the match function returned the sequence of substrings
> matched or equivalently a sequence of the match positions, then the
> string could be broken up and nodes added as required.
I think that you need a sequence of match positions *and lengths* in
the latter case, to make it possible to pull out the matched string?
Hmm... can't helping thinking that these flat sequences are going to
processing quite difficult - extracting a list of the matched strings
from the sequence would mean:
for $i in (1 to count($matches) div 2)
return substring($string, $matches[$i], $matches[$i + 1])
or a recursive function, neither of which is particularly practical.
On the other hand, I think it's impossible to reliably go from the
matched subexpression string to the location of the subexpression
within the original string.
> Actually it might be interesting (and more in the xpath style) to
> allow omnimark style named variable binding (the found-text in the
> above) within the serach string which would then be accessed by
> normal xpath xpath variable reference, $found-text, in any functions
> triggered by the replacement code.
You *could* do this implicitly by setting the variables $1..$N, since
authors cannot set these variables themselves (invalid names). But
either seems a bit messy to me - how do you define the scope, for one
thing?
Cheers,
Jeni
---
Jeni Tennison
http://www.jenitennison.com/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list