New feature "source-id"

Wed May 21 20:42:00 GMT 2014

I did consider doing a fuse file system. It could work, and would minimize the changes needed to gdb. I'm not aware of any specific problems but I suspect it would add some extra complexity, and might make it difficult to avoid some problems. But, as you say, if it gets it done then that counts for a lot. Presumably such a system would use the Python hooks (that I currently use) in order to configure the fuse file system.

One potential problem with the fuse file system is with loading source files with matching file names. Let's say we have server.so and client.so and both have a file called foo.c. In an ideal world the debug information would contain a full path to foo.c and we could use "set substitute-path" to remap from the two different build directories to two different fuse file systems, and life would be good. Unfortunately the reality is that many projects (including libc6) put incomplete paths into the debug information. If source-id was a first-class feature of gdb then when the debugger needed "foo.c" for server.so it would look in the source-mapping information in server.so.dbg, find the version control information, retrieve the file, and load it. If source-id is not a first-class feature of gdb then I see no way to set up source search paths such that the right version would be loaded at the right time. We could try to fix every build system in the world to embed full paths, but that will never happen.

This failure case is real. On the other hand, it is rare enough that a source-id system that ignored it would still be incredibly useful.

> Then when your build completes, you would record a build-id -> source-id mapping.  

Where would the build-id -> source-id mapping be stored? The method we currently use is to have a section in the debug file which contains the mapping from source files to the version control identifiers. This feels like a simple and reliable method to record the mapping. We've been using it for over a year, and similar techniques have been used for many years on other platforms, so it is a proven technique.

-----Original Message-----
From: Tom Tromey [mailto:tromey@redhat.com] 
Sent: Wednesday, May 21, 2014 12:30 PM
To: Bruce Dawson
Cc: 'Gerhard Gappmeier'; gdb-patches@sourceware.org
Subject: Re: New feature "source-id"

>>>>> "Bruce" == Bruce Dawson <bruced@valvesoftware.com> writes:

Bruce> I understand that some Linux distributions already make source 
Bruce> packages for each package that they distribute, and this 
Bruce> technique offers some unique advantages.

Bruce> However, this is orthogonal to the source-id proposal. 
Bruce> Source-id's offer different value that is complementary.

Bruce> Our build system spits out dozens of builds a day. Some of these 
Bruce> are run by developers, others by testers, and others by 
Bruce> customers. Any one of them might crash. I might end up debugging 
Bruce> (live debugging or a core file) any one of these builds, perhaps 
Bruce> weeks after it was created. Because we have the source-id system 
Bruce> set up I know that I can walk up and down the stack and have the 
Bruce> source files automatically show up, with *zero* effort on my 
Bruce> part. I don't' have to install source packages, I can have 
Bruce> multiple core files from multiple versions loaded simultaneously. 
Bruce> Only the source files that I need are downloaded so it is 
Bruce> *extremely* efficient. Retrieving the needed source files is 
Bruce> essentially instantaneous and requires zero developer effort.

I wonder if you considered an approach based on build-ids.

You'd start with the existing build-id feature.  Then when your build completes, you would record a build-id -> source-id mapping.  Finally you would have a small fuse filesystem that looks up the build-id in the database and fetches the appropriate source tree from git.

One benefit of this approach is that it requires nearly no changes in gdb.
This avoids a lot of bikeshedding ;)

I found a few git/fuse projects on github.

If you considered this & rejected it, I'd be curious to know why.
If it doesn't meet your needs then I probably misunderstood what you are going for.

FWIW the SRPM-based approach we use at Red Hat is pretty good, but not truly great.  It has a hack in the rewriting step and sometimes the source tree layout isn't preserved properly somehow.

So something like the above may be more desirable overall.

Tom