This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
RE: sort a foreach on a stat value?
- From: "Stone, Joshua I" <joshua dot i dot stone at intel dot com>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: <systemtap at sources dot redhat dot com>
- Date: Thu, 19 Jan 2006 15:02:40 -0800
- Subject: RE: sort a foreach on a stat value?
Frank Ch. Eigler wrote:
>> foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
>
> That sort of thing has some promise at abbreviating that excessive
> duplication hunt made an example of in bug #2115.
>
> While this does not address sorting, another related syntactical
> possibility is to infer a "[idx1, idx2]" suffix on undecorated
> occurrences of the indexed array within the body of a foreach:
>
> foreach ([x,y] in thingie)
> total += thingie # implied [x,y]
>
> foreach ([x,y,z] in mystats)
> printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))
>
> The latter could be abbreviated further to "@count, @sum, @min", to
> infer the innermost-looped array itself, plus its index tuple.
>
> A later independent optimization could make sure that the translator
> does not emit duplicate array-lookup operations within loops.
Inferring like this would go a long way toward simplifying the language
(from a user perspective). And as you mention, it leaves more
opportunity for optimization.
This can be taken even further if you allow special format characters
(as in _stp_stat_print) - Then your second example could be just:
foreach ([x,y,z] in mystats)
printf("%C %S %m", mystats)
At this point we're not too far from a printa, but manually doing a
foreach lets you sort it.
> Unless I'm mistaken, the current runtime aggregates the whole pmap for
> loops/sorting, even if you want just the top 20. This cost will be
> fully reflected in activity count (bug #1885) at some point. It is
> unlikely to cost much less than the explicit copying loop above.
Yes, it does fully aggregate when you do the foreach. As for cost, I
suspect that will depend on the keys - if you're indexing by one or more
strings I would expect the savings to be more significant.
> I wonder if this behavior makes sorting on statistical values
> sufficiently inefficient that special syntax is not sufficiently
> justified at this point, given that open-coding is possible.
Perhaps, but if we can find a "nice" way to express the sorting it may
still be worth pursuing...
>>>> Along the same lines, it would be extremely useful to be able to do
>>>> "cascading" sort - i.e. sort by more than one field.
>>>
>>> [...]
>>> foreach ([x1+, x2--, y2+++] in array----) { ... }
>>
>> That's not a bad suggestion, though I think it's not obvious in which
>> order the cascading happens. [...]
>
> I guess we'd pick and document one of the two interpretations.
Another possible syntax choice could be:
foreach([x, y, z] in array sorted [@count-, x+, z-])
Thus the sorting order is explicit. Ascending (+) could be implied.
Josh