[Bug debuginfod/27673] [debuginfod] Handle source requests for same buildid more efficiently

Wed Mar 31 16:03:26 GMT 2021

https://sourceware.org/bugzilla/show_bug.cgi?id=27673

--- Comment #4 from Frank Ch. Eigler <fche at redhat dot com> ---
> The time it takes for the client to see the response of the server to the
> request consist of:
> - time for request to travel to server (latency)
> - time for server to react to request
> - time for answer to travel back to client (latency again)
> 
> I've looked at the documentation of the option fdcache-prefetch, and AFAIU
> this improves "time for server to react to request".

That's correct.

> debuginfod:
> - when receiving a source request and ENOENT, send as reply the list of
>   available files for the buildid
> 
> client:
> - when receiving a list of available files for a buildid, store it and
>   use it to reply to source requests related to the buildid. That is,
>   if the file is not in the list, reply with -2.  Otherwise, send a
>   request to debuginfod, and expect it to succeed.

Interesting.  A more first-class solution could be a new webapi to
enumerate source files: a "/buildid/HEXCODE/sourcelist" query that
returns a structured piece of data.  This can be computed by debuginfod
fairly rapidly.  The client could cache that and use it to drive a
negative-cache hit on a subsequent source query.

> Proposal b:
> 
> debuginfod:
> - when receiving a source request, send a package with the sources
>   for that buildid to the client.
> 
> client:
> - when receiving a package with the sources for a buildid, store them
>   and use them to reply to source requests related to the buildid.

So this could be a "/buildid/HEXCODE/sources" query that returns a tarball of
all sources related to a given buildid.  This is challenging in principle
because sources may not live in a single upstream package we can just relay
verbatim.  debuginfod may have to assemble a new one on the fly, kind of like
gitweb's 'archive' buttons ... which are disabled by default for performance
reasons.  Worth a consideration I guess, but risky to deploy.

By the way, a client also has another option: querying in parallel.  If it
knows all interesting file names, it can fork N threads and make N concurrent
requests to debuginfod.  The poor server may get larger bursts of load but
total elapsed time should be better.

And another option: if connection establishment / teardown are a bit part of
the problem - and they can be with TLS - we could teach the client code to
activate as much curl level http-keepalive as possible.  So as long as a single
debuginfod_client object were reused, it could avoid the TCP/TLS handshakes. 
(It MIGHT already be doing that.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.