This is the mail archive of the archer@sourceware.org mailing list for the Archer project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: dwarf name canonicalization


On Tuesday 23 February 2010 23:06:50 Keith Seitz wrote:
> [OT: I would love a test case. I *pleaded* for specific test cases.]

Yes, I remember that. Sorry. I still had it on my TODO list, just had not
found the time to create something that it's easily reproducible without
too much external dependencies. 

Looks like it is time to act now.

All my "real" use cases would require Qt which is probably not acceptable 
here, so let me have a shot at a contrived example that I'd consider
structurally not too far off from reality, a ~1000 function "project", 
structured like this:

----------------------- lib1.h --------------------
#ifndef LIB1_H
#define LIB1_H

#include <string>
#include <vector>

#include <map>

namespace ns {
namespace inner {

struct Foo1
{
    int foo0(std::map<std::string, std::vector<std::string> > &map, 
         const std::string &index, const std::string &x);
   // [...]
    int foo25(std::map<std::string, std::vector<std::string> > &map, 
         const std::string &index, const std::string &x);
    int sum();
};
[...]

----------------------- lib1.cpp --------------------
[...]
int Foo1::foo25(std::map<std::string, std::vector<std::string> > &map,
 const std::string &index, const std::string &x)
{
        return map[index].size() < x.size();
}

int Foo1::sum()
{
        int t = 0;
        std::map<std::string, std::vector<std::string> > m;
        m["key 0"].push_back("value 0");
        t += foo0(m, "key 1", "xxx");
        [...]
        return t;
}

----------------------- main.cpp --------------------
#include "lib1.h"
[...]

using namespace ns::inner;

int main()
{
       int s = 0;
        s += Foo0().sum();
        s += Foo1().sum();
       [...]
       return s;
}


I'll attach a perl script generating the code. Don't look at the actual code
too close, it really does not matter. A quick test also indicates that neither
the number or files nor of functions make a difference for the time ratio.

With 7.0.90 gdb spends 15.48% of its instructions in dwarf2_canonicalize_name
and functions called from there,  with 7.0.1 it is only 0.04%. 

Total instruction count is 429,137,527 vs 516,590,964.
Both versions of gdb are compiled with -O2 -g  using gcc 4.4.1.

I certainly do understand that instruction count does not need to mean
much, but it is fairly reproducible and in this case it correlates indeed with 
wall clock times.

Note that the number will get _much_ worse when it comes to "modern"
C++ like code using template expressions or even 

> [...] Unless the IDE provided a console that accepted generic input (like 
> "normal" gdb), I don't think that much would break, if anything. IDEs 
> really rather rely on linespecs for the most part, no? As long as you're 
> not sending input to gdb that looks like a function name, you should be 
> safe. But I cannot guarantee. I have no first-hand experience with IDEs 
> (in many years).

From my point of view it is a safe assumption that most if not all IDE users
would prefer a 15% startup time gain over an improved parsing of function 
names - especially since they are very unlikely to ever use anyway.

However, it looks like it does not even have to be an either-or here. If 
the canonicalization would be made optional using, say, some 'maint set'
switch, a user could make his own choice, and an IDE could even apply 
some "cleverness" like switching canonicalization off in the beginning
and reload with canonicalization as soon as the user triggers an operation
that needs canonicalization. Or maybe even retrieve a list of uncanonicalized
symbols and match user input against that before bothering gdb with it.

> I would much rather address (fix?) the speed problem first. The idea of 
> multiple paths through the code for the "same" task would seem a high 
> bit rot risk.

I am not sure this will solve the problem. Even if you were able to speed up
canonicalization by, say, 30%, it would still impact startup times by 10%, 
unconditionally, no matter whether the result is ever needed. And 10% are
highly visible when the total time is in the "several dozen seconds" range.

Andre'

Attachment: createit.pl
Description: Perl program


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]