This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Language design (was Re: sort a foreach on a stat value?)


My I ask that before we start adding more and more features to the
language, that we pause for a moment and attempt to determine if they
are really necessary?  I never did see any justification for "foreach"
at all.  Is is too late to try to make the language small, easy to use,
efficient and ideal for the collection of data from probe points,
without adding in a bunch of general purpose features that will
encourage programmers to do things they probably shouldn't?

If you really want really fancy post-processing, instead of beefing up
our language, a far better solution would be to pipe the through a
different interpreter, like perl.

On Mon, 2006-01-16 at 15:43 -0500, Frank Ch. Eigler wrote:
> Hi -
> 
> joshua.i.stone wrote:
> 
> > [...]  One thing I've noticed is that our foreach syntax has
> > different semantics than other languages [...]
> 
> Indeed, just like in awk, we iterate over indexes rather than values.
> 
> 
> > [...]
> > 	foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
> > [...]
> 
> That sort of thing has some promise at abbreviating that excessive
> duplication hunt made an example of in bug #2115.  
> 
> While this does not address sorting, another related syntactical
> possibility is to infer a "[idx1, idx2]" suffix on undecorated
> occurrences of the indexed array within the body of a foreach:
> 
>    foreach ([x,y] in thingie)
>      total += thingie # implied [x,y]
> 
>    foreach ([x,y,z] in mystats)
>      printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))
> 
> The latter could be abbreviated further to "@count, @sum, @min", to
> infer the innermost-looped array itself, plus its index tuple.
> 
> A later independent optimization could make sure that the translator
> does not emit duplicate array-lookup operations within loops.
> 
> 
> > [...]
> > >    foreach (tid in stat) // sort by value -> ???
> > >      stat_counts[tid] = @count(stat[tid])
> > >    foreach (tid in stat_counts-)
> > >      printf("%d: %d\n", tid, stat_counts[tid]) # and/or
> > >  @avg(stat[tid])) etc. }
> > [...]
> > This is a passable workaround, yes.  The downside is that if stat were
> > very large, I would have to fudge with the maxaction counter.  If I was
> > only interested in maybe the top 20, then with a single loop construct
> > it's easy to break out after 20 and not hit the MAXACTION boundary.
> 
> Unless I'm mistaken, the current runtime aggregates the whole pmap for
> loops/sorting, even if you want just the top 20.  This cost will be
> fully reflected in activity count (bug #1885) at some point.  It is
> unlikely to cost much less than the explicit copying loop above.
> 
> I wonder if this behavior makes sorting on statistical values
> sufficiently inefficient that special syntax is not sufficiently
> justified at this point, given that open-coding is possible.
> 
> 
> > >> Along the same lines, it would be extremely useful to be able to do
> > >> "cascading" sort - i.e. sort by more than one field.
> > > 
> > > [...]
> > >   foreach ([x1+, x2--, y2+++] in array----) { ... }
> > 
> > That's not a bad suggestion, though I think it's not obvious in which
> > order the cascading happens.  [...]
> 
> I guess we'd pick and document one of the two interpretations.
> 
> 
> - FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]