[PATCH v3] Add debuginfod support to GDB

Thu Feb 20 18:02:00 GMT 2020

On 2020-02-10 6:51 p.m., Aaron Merey wrote:
> Hi Simon,
> 
> Thanks for the feedback, I've updated the patch with your suggestions
> (attached) and I've included some additional comments below.
> 
> On Fri, Feb 7, 2020 at 3:39 PM Simon Marchi <simark@simark.ca> wrote:
>> I think the cancellation is not quite right, although I have not
>> tried it yet.  The problem is that we don't seem to differentiate
>> user cancellation from other errors.
>>
>> When long running operations are cancelled in GDB, that is done by
>> throwing an exception.  For example, if you process a lot of data,
>> you can call the QUIT macro in your loop such that an exception will
>> be thrown if the quit flag (set by the SIGINT handler) is set.
>>
>> Here, if the user does ctrl-c during a download, the download will
>> be interrupted and debuginfod_find_debuginfo will return some error
>> code.  We will then simply return an error to the caller.  It will
>> look as if the query failed, which is different from user interruption.
>>
>> If we return a non-zero value in the progress function, we should note
>> it somewhere, such that when debuginfod_find_debuginfo returns, we can
>> throw a "quit" exception.  The usual place to save that would be in some
>> void* user data passed to debuginfod_find_debuginfo which would also be
>> passed to the progress function, but unfortunately the library does not
>> provide that.
> 
> I agree that there's an ambiguity here. To fix this I changed the
> progress function so that it now preserves quit_flag's value after
> calling check_quit_flag and detecting that the user interrupted a
> download. The GDB interpreter will now see that SIGINT was raised,
> process it like usual and print "Quit" by the prompt. Hopefully this
> sufficiently distinguishes a user interrupt from other errors.

I think we still need something just after the debuginfod_find_debuginfo
and debuginfod_find_source calls to throw the "quit" exception, if the quit
flag is set at this point.  Without that, the debuginfod_source_query and
debuginfod_debuginfo_query will return a failure.  The calling code could
therefore conclude that debug info is not available (and cache that value
somehwere), for a file for which debug info is actually available.  When
the execution reaches the event loop and we process the quit flag, it
would be too late, the damage will have been done.

> 
> I do see the point you make about how the debuginfod library
> might handle these things internally. Another possibility would
> be for debuginfod_find_* to return something like -EINTR when
> the progressfn triggers a download cancelation.
> 
>>> +set port [exec sh -c "while true; do PORT=`expr '(' \$RANDOM % 1000 ')' + 9000`; ss -atn | fgrep \":\$PORT\" || break; done; echo \$PORT"]
>>
>> Hmmm, this opens the door to flakiness due to TOCTOU.
>>
>> Would it be feasible instead to ask debuginfod to bind to port 0 (which would make the kernel
>> choose any free port) and then read the port number from debuginfod's stdout?  It seems like
>> it already partly works, when I start debuginfod with `-p 0`, it binds to random ipv4 and ipv6
>> ports.  However it prints:
>>
>>   started http server on IPv4 IPv6 port=0
>>
>> But I guess it would not be too difficult in that case to fetch the effective port and print that.
>> Also, the ipv4 and ipv6 port numbers are different, so you would need to print both.
> 
> To fix this TOCTOU concern I changed the test driver to iterate through
> ports starting at 8000 and attempt to start up a debuginfod instance
> until a successful start up is detected from the server's output via Expect.

Ok, that works.

>> Just wondering, does the server output something when it's done initializing?  would it be
>> possible and simpler to read its stdout and wait for that message?  That's the kind of thing
>> that should be relatively easy with Expect.  You would have to start the process with Expect's
>> `spawn` procedure and await its output using the `expect` procedure.
> 
> Before running the tests we want to make sure the server has found the
> necessary debug and source files. Asynchronous threads scan for these
> files and they do not necessarily find them before debuginfod
> initializes and begins accepting requests. Therefore we have to query
> the server for its metrics (part of the debuginfod web API) which
> eventually indicate that the correct amount of files were found during
> scanning.

Ok.

Formatting: we use a mix of tabs and spaces for indentation (tabs for whole
groups of 8 columns, spaces to align after that).

In dwarf2_get_dwz_file, you verify the build id of the result, but
not in elf_symfile_read.  Is there a reason not to do it in
elf_symfile_read?

If the build id of the thing returned by the server doesn't match,
we should probably display a warning, because that could really be
a symptom of some important problem.

In open_source_file, the end of the two paths are identical.  Use
a single scoped_fd variable (remove src_fd) and just let the code
execute until the "return fd;".

Simon