This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Latest XSLTMark benchmark

To: xsl-list at lists dot mulberrytech dot com
Subject: Re: [xsl] Latest XSLTMark benchmark
From: Uche Ogbuji <uche dot ogbuji at FourThought dot com>
Date: Wed, 04 Apr 2001 16:51:58 -0600
cc: David_N_Bertoni at lotus dot com, "Xsltmark at Datapower dot Com" <xsltmark at datapower dot com>
Reply-To: xsl-list at lists dot mulberrytech dot com

> > You did not cover 4XSLT then, nor did any one of you contact the
> > 4Suite list to ask about realistic test driver methodology.
> 
> The previous point was about Xalan, in response to David Bertoni's
> concerns. It is true that the 4XSLT driver is a new addition in
> this release, perhaps we should have put it in quarantine. Of
> course, we have also had some folks contact us since the XML.com
> article complaining that particular processors were "excluded" --
> so perhaps including 4XSLT should be viewed as a good thing.

How on earth could it be a good thing?  The people who wrote in are likely 
complaining because they expect that the XSLTMark is a properly controlled 
test.  It does them a great disservice to release test with shoddy and obscure 
procedures.

It's better not to release benchmarks at all if the necessary standards of 
rigor are unattainable.  There is a good reason why respectable benchmarking 
tend to lag technologies by years.

> While I wasn't closely involved, I believe the driver is a
> modified example from the 4XSLT distribution.

The driver bears no resemblance to anything we've written in any example.  
It's quite inscrutable, in fact, and seems rather convoluted.  If you had a 
clear statement of benchmark architecture I might be interested in writing a 
clean match, but I hardly have the time to read the mind of whomever wrote the 
current harness.

It's a rather rare event for me to have any difficulty reading someone else's 
Python code, but that benchmark driver is quite an odd bit of code.

> More to the point, in the case of 4XSLT we *did* ask for help
> and, as a matter of fact, in the process one of our engineers
> located three bugs in 4XSLT for which you yourself thanked him!
> 
> http://lists.fourthought.com/pipermail/4suite/2001-February/001444.html

I never said 4XSLT doesn't have bugs.  I'm glad you found no more than 3.  But 
None of the messages you point to are particularly relevant to the resulting 
benchmark.

In fact, your engineer was asking about omniORB.  OmniORB has nothing 
whatsoever to do with 4XSLT, which just strengthens my impression that the 
benchmark driver was constructed in some ignorance.

Was there any reason not to simply post the driver and ask "is this a fair way 
to harness 4XSLT for a benchmark"?  It's all OSS, right?  Full disclosure and 
all that.

> I am not sure that XSLTMark itself was mentioned -- but presumably
> the advice is the same for normal operation and for benchmarking!
> If there is something we are doing wrong, please let us know. As
> Michael Kay pointed out a few weeks ago on this very list, it is
> not surprising that "one tends to do best in one's own benchmarks".
> 
> (See http://lists.fourthought.com/pipermail/4suite/2001-February/author.html
> for the full exchange of messages between a DataPower engineer and
> 4XSLT folks).

Again, that exchange is simply irrelevant to the results of the benchmarks.  
Perhaps if the queries were more direct, all this mess would have been 
avoided.  We can only answer the questions that are asked.  We don't read 
minds.


> > Your lack of documentation of method, constraints, restrictions,
> > architecture, environment, etc. is *highly* unprofessional.
> 
> Actually, it is documented. And all the source code is open and
> available so anyone can:
> 
> 1. review it for proper behavior
> 2. independently reproduce the results
> 3. use it for internal testing (nice of us, no?)

No.  You have source code.  That's only useful for one who wants to track down 
every idiosyncracy of every driver you have written.  You have no general 
statement of principles.  You don't indicate the controls you try to maintain. 
 You don't point out places where those controls are unmanageable, and how 
this might statistically affect your benchmark.  Hell, you don't even have a 
statistic confidence level for your results.

This is not the scientific method.  It's pretty much voodoo benchmaarking.  
Saying "we provide all the code" is not very useful.

> Now, documentation aside, is there actually something wrong with
> the driver? If you have specific problems with the 4XSLT driver,
> please let us know (at xsltmark@datapower.com), and we will set
> it right.

If you can clearly express what it is you are testing, what you need to 
control, what environmental constraints you are using (everything from OS to 
Machine architecture to Vendor and Language version), then I can perhaps 
construct a useful driver for you.

I have none of that information, so without diggin deeply into your Java test 
harnesses, I can't provide a fair driver.  This is because I take such 
normalization seriously for benchmarking.  You need to do so as well.

I must note that as you yourself have admitted, you don't even have reasonable 
normalization among drivers within a single environment (Java).


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

References:
- RE: Latest XSLTMark benchmark
  - From: Eugene Kuznetsov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]