RFCv2: debuginfod debian archive support

Frank Ch. Eigler fche@redhat.com
Sat Dec 7 03:03:00 GMT 2019


Hi -

Thanks for the review!

> I looked at some distros, but the only ones that provide consistent
> debug[info] packages do so in rpm or deb format.

Yeah.

> What is the difference between a .deb and a .ddeb file/archive?

AIUI, .ddeb = debugging .deb ... although the "-dbgsym" substring
already says about the same thing.


> > @@ -851,7 +867,11 @@ handle_buildid_r_match (int64_t b_mtime,
> >        return 0;
> >      }
> >  
> > -  string popen_cmd = string("rpm2cpio " + shell_escape(b_source0));
> > +  string archive_decoder = "/dev/null";
> > +  for (auto&& arch : scan_archives)
> > +    if (string_endswith(b_source0, arch.first))
> > +      archive_decoder = arch.second;
> > +  string popen_cmd = archive_decoder + " " + shell_escape(b_source0);
> >    FILE* fp = popen (popen_cmd.c_str(), "r"); // "e" O_CLOEXEC?
> >    if (fp == NULL)
> >      throw libc_exception (errno, string("popen ") + popen_cmd);
> 
> This seems a lot of work for dealing with non-archives. 

We don't do this for non-archives.  This is in the path where an
archive record already matched in the database.


> > +archive_classify (const string& rps, sqlite_ps& ps_upsert_buildids, sqlite_ps& ps_upsert_files,
> >                sqlite_ps& ps_upsert_de, sqlite_ps& ps_upsert_sref, sqlite_ps& ps_upsert_sdef,
> >                time_t mtime,
> >                unsigned& fts_executable, unsigned& fts_debuginfo, unsigned& fts_sref, unsigned& fts_sdef,
> >                bool& fts_sref_complete_p)
> >  {
> > -  string popen_cmd = string("rpm2cpio " + shell_escape(rps));
> > +  string archive_decoder = "/dev/null";
> > +  for (auto&& arch : scan_archives)
> > +    if (string_endswith(rps, arch.first))
> > +      archive_decoder = arch.second;
> > +  string popen_cmd = archive_decoder + " " + shell_escape(rps);
> >    FILE* fp = popen (popen_cmd.c_str(), "r"); // "e" O_CLOEXEC?
> >    if (fp == NULL)
> >      throw libc_exception (errno, string("popen ") + popen_cmd);
> 
> Likewise as above. Can we skip the whole popen dance if nothing
> matches?

If you check out the caller, this part is not even called if
the extension does not match.


> >    rc = archive_read_support_filter_all(a);
> >    if (rc != ARCHIVE_OK)
> >      throw archive_exception(a, "cannot select all filters");
> 
> In theory you could know the format you are expecting, just like you
> know the decoder above. Might that give slight more accurate error
> messages and maybe be slightly more efficient since libarchive doesn't
> need to try to see if it correctly detect the format itself.

The "filter" part is for different compression algorithms.  Keeping it
this way costs us nothing and lets the code tolerate changes in
compression policy in the future.


> But it does feel like the errors, logs and metrics are a little
> generic (e.g. "cannot select all format").

The way in which specializing the format errors could help if
debuginfod were run against rpms that had a non-cpio payload, or debs
that had a non-tar payload.  This means some sort of corruption, which
contravenes our "trustworthy data" assumption -- or upstream policy
change, which is nothing to worry about.

If you think separate metrics for .deb vs .rpm archives might be
useful, can do.


> Note that in some of the unpatches code there are still references to
> rpms in some logs and comments that should also be "archive".

OK, will recheck, but some of those references may be valid because
RPMs do have some unique characteristics.  Some other mentions were
nuked from the branch version of the patch, so the email is a little
obsolete.


> > +2019-12-02  Frank Ch. Eigler  <fche@redhat.com
> > +
> > +	* debuginfod.8: Add -U (DEB) flag, generalize RPM to "archive".
> > +
> > +2019-11-26  Frank Ch. Eigler  <fche@redhat.com>
> > +	    Aaron Merey  <amerey@redhat.com>
> > +
> > +	* debuginfod.8, find-debuginfo.1, debuginfod_*.3: New files.
> > +
> 
> That second ChangeLog entry looks incorrect.

It adds an entry that was missed earlier.


> Is my understanding that debug debs don't contain the actual source
> correct?

Correct, not at this time.

> If that is the case can't we take advantage of that by never
> indexing source files from debs?

Or specifically, not populating the -sdefs/-srefs data.  That's a
reasonable optimization - OTOH if at some point, deb files start
including proper debugedit-style sources, then this code will just
work.


> [...]
> Already in the original, but should that be "so that not all
> source..."?

Yes.


> > -In summary, if your system can bear a 0.5%-3% index-to-RPM-dataset
> > +In summary, if your system can bear a 0.5%-3% index-to-archive-dataset
> >  size ratio, and slow growth afterwards, you should not need to
> >  worry about disk space.  If a system crash corrupts the database,
> >  or you want to force debuginfod to reset and start over, simply
> 
> Here I think you should leave it at RPM since that is what you
> originally measured. It isn't clear to me that things are the same for
> debs since those don't include the sources?

Actually, if the above optimization is not made, the present code does
consume approximately the same amount of data (outgoing source-refs
plus deb-file-list source-defs).


- FChE



More information about the Elfutils-devel mailing list