This is the mail archive of the
gdb@sourceware.org
mailing list for the GDB project.
Re: Function fingerprinting for useful backtraces in absence of debuginfo
On Thu, Sep 15, 2011 at 19:48:31 +0200, Jan Kratochvil wrote:
> Hi Martin,
>
> I see this was more directed at gcc people but I hope I can reply some.
I guess I can try my luck at a gcc mailing list;)
> On Thu, 15 Sep 2011 14:32:31 +0200, Martin Milata wrote:
> > * The name of the function, if the corresponding binary is compiled
> > with function symbols (as is the case with the libraries) together
> > with offset into the function.
>
> This is not true for static functions in the libraries:
>
> ==> 26.c <==
> extern void f (void (*) (void));
> static void i (void) {}
> int main (void) { f (i); return 0; }
>
> ==> 26l.c <==
> static void g (void (*h) (void)) { h (); }
> void f (void (*h) (void)) { g (h); }
>
> gcc -o 26l.so 26l.c -Wall -shared -fPIC -s; gcc -o 26 26.c -Wall -g ./26l.so; gdb -nx ./26 -ex 'b i' -ex r -ex bt
> #0 i () at 26.c:2
> #1 0x00007ffff7dfc53e in ?? () from ./26l.so
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> #2 0x00007ffff7dfc558 in f () from ./26l.so
> #3 0x00000000004005c8 in main () at 26.c:3
>
> glibc in Fedora packaging is probably the only exception; it has the .symtab
> section in the main rpm. All the other libraries have .symtab only in the
> debuginfo rpm.
Good point ... well, at least we can use the names in .dynsym and fall
back to the other method if the function does not have a symbol table
entry.
> > (Call graph properties)
> > * List of the library functions called.
>
> That is the functions called via .plt section - either from different libraries
> or within the same library (if it does not use direct calls like glibc does).
> Hopefully this should not change, I agree.
Great, this is so far the most important component in the signature as
each of the rest of the properties only provide one bit of information.
This unfortunately means that lot of functions that do not call anything
through .plt have the same fingerprint. Can you think of some other
properties that we could use in those functions?
> > * Whether the function calls some other functions in the file.
>
> Various functions get inlined during various optimizations levels and compiler
> changes, it also changes with gcc -flto.
>
> > * Whether the function calls itself.
> > (Presence of types of instructions)
>
> Tail call optimizations (call+ret -> jump) change this so -O0 vs. -O2 code will
> be definitely different; But -O2 compilation of slightly different code
> hopefully should have the same signature.
I see.
> > * Conditional jumps based on equality test/signed comparison/unsigned
> > comparison.
>
> This is the exact target of the gcc -fprofile-* optimizations; AFAIK SuSE uses
> it a lot (I had some negative results trying to apply it for gdb packaging).
> That is to invert the jump conditional and reshuffle the code around so that in
> >50% cases it does not jump depending on "random" benchmark data during each
> >build.
But we can test if the code contains either of the jX and jnX
instructions, right?
> > So the question is: How to improve this function fingerprinting
> > scheme? Is there a better approach for coredump duplicate detection?
>
> I am a bit skeptical against such function content comparison but sure it does
> not have to be perfect.
I don't like it very much either, but I wasn't able to come up with
anything else that would work solely on the core dumps and binaries.
Sure it won't work perfectly, but hopefully it could work well enough to
be useful.
> There may be soon cheap enough to run gdbserver on the local core file with
> the recent optimization by Paul Pluzhnikov to be finished:
> Re: [patch] Implement qXfer:libraries for Linux/gdbserver
> http://sourceware.org/ml/gdb-patches/2011-08/msg00291.html
> But I do not have any benchmark numbers now to support it.
I'll see if we can somehow use the gdbserver, though we'd rather have
something that works without transmitting data over the network, because
doing it remotely would require additional infrastructure. Also, if I
understand correctly, the connection has to be initiated from the host
machine which might be a problem if there are NATs/firewalls on the way.
Anyway, thank you for your response.
Martin