This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: sort a foreach on a stat value?

From: "Stone, Joshua I" <joshua dot i dot stone at intel dot com>
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: <systemtap at sources dot redhat dot com>
Date: Thu, 19 Jan 2006 15:02:40 -0800
Subject: RE: sort a foreach on a stat value?

Frank Ch. Eigler wrote:
>> 	foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
> 
> That sort of thing has some promise at abbreviating that excessive
> duplication hunt made an example of in bug #2115.
> 
> While this does not address sorting, another related syntactical
> possibility is to infer a "[idx1, idx2]" suffix on undecorated
> occurrences of the indexed array within the body of a foreach:
> 
>    foreach ([x,y] in thingie)
>      total += thingie # implied [x,y]
> 
>    foreach ([x,y,z] in mystats)
>      printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))
> 
> The latter could be abbreviated further to "@count, @sum, @min", to
> infer the innermost-looped array itself, plus its index tuple.
>
> A later independent optimization could make sure that the translator
> does not emit duplicate array-lookup operations within loops.

Inferring like this would go a long way toward simplifying the language
(from a user perspective).  And as you mention, it leaves more
opportunity for optimization.

This can be taken even further if you allow special format characters
(as in _stp_stat_print) - Then your second example could be just:

   foreach ([x,y,z] in mystats)
     printf("%C %S %m", mystats)

At this point we're not too far from a printa, but manually doing a
foreach lets you sort it.

> Unless I'm mistaken, the current runtime aggregates the whole pmap for
> loops/sorting, even if you want just the top 20.  This cost will be
> fully reflected in activity count (bug #1885) at some point.  It is
> unlikely to cost much less than the explicit copying loop above.

Yes, it does fully aggregate when you do the foreach.  As for cost, I
suspect that will depend on the keys - if you're indexing by one or more
strings I would expect the savings to be more significant.

> I wonder if this behavior makes sorting on statistical values
> sufficiently inefficient that special syntax is not sufficiently
> justified at this point, given that open-coding is possible.

Perhaps, but if we can find a "nice" way to express the sorting it may
still be worth pursuing...

>>>> Along the same lines, it would be extremely useful to be able to do
>>>> "cascading" sort - i.e. sort by more than one field.
>>> 
>>> [...]
>>>   foreach ([x1+, x2--, y2+++] in array----) { ... }
>> 
>> That's not a bad suggestion, though I think it's not obvious in which
>> order the cascading happens.  [...]
> 
> I guess we'd pick and document one of the two interpretations.

Another possible syntax choice could be:

  foreach([x, y, z] in array sorted [@count-, x+, z-])

Thus the sorting order is explicit.  Ascending (+) could be implied.


Josh

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]