patch 2/2 debuginfod server etc.

Thu Nov 14 12:30:00 GMT 2019

Hi -

> Like with the client manpages, I think they should go into the doc/ dir
> just because all manpages are there at the moment.

OK.

> > -bin_PROGRAMS = debuginfod-find
> > +bin_PROGRAMS = debuginfod debuginfod-find
> > +debuginfod_SOURCES = debuginfod.cxx
> > +debuginfod_cxx_CXXFLAGS = -Wno-unused-functions
> 
> Should that be debuginfod_CXXFLAGS?
> Why are there unused functions are there?

I don't think there are in debuginfod itself.

> > +# g++ 4.8 on rhel7 otherwise errors gthr-default.h / __gthrw2(__gthrw_(__pthread_key_create) undefs
> 
> Could you explain this comment a bit more?

Not sure, these might have been notes from diagnosing
the argp.h bug that messed with function attributes,
or something else.

> > +dist_man8_MANS = debuginfod.8
> 
> As stated above, I think this should be moved to doc/

OK.

> [...]
> So you give it directories with executables and directories with debug
> files. They are indexed if they have a build-id. You never provide a
> sources dir, but if something is indexed as a debug file then any
> source file referenced from it can be requested.

Yes.

> In theory you only need to provide the executable dirs, from that you
> can normally get the separate .debug files.

That's if those debug files are installed in the proper system 
location.  A build tree mid-strippinng is not like that.  And
in the case of RPMs, it's not an option at all.

> Why didn't you do that (only providing executable dirs)?
> And why do you allow sources to be found indirectly?

Because there's no need to search/index source files.  We can already
tell instantly whether they exist (when we parse the dwarf file) via a
stat(2).  We index debuginfo/source rpms for their content because
that instant check is not possible.

> Would it make sense to restrict the sources (prefix) dir?

I don't see any reason, because ...

> The questions above are simply because it makes me nervous that some,
> but not all, files that can be requested can indirectly be found
> anywhere. If you have an explicit debug dir, then it might make sense
> to also have an explicit sources dir.

If the executables we find are trusted, then the source file paths
inside are trustworthy too.  (We don't have an explicit debug dir.)

> In general I think it would be good to think about a selinux security
> context for debuginfod. This is not urgent, but I think we should see
> if we can get someone to do a security audit.

Will keep it in mind.

> One concern I have is that if a process is run in a restricted security
> context in which it might not be able to access certain files, but it
> might have access to a local socket to debuginfod, then it can get
> access to files it normally wouldn't. 

Note that the systemd service runs under an unprivileged userid.

> And if it could somehow write into any directory that debuginfod
> reads then it could construct a .debug file that points to any file
> on the system by adding it to the DWARF debug_lines file list. All
> this might be unlikely because this is all locally running
> processes. But it would be good to have someone who is a bit
> paranoid take a look if we aren't accidentally "leaking" files.

I see where you're going with it, it's a reasonable concern.  For now,
we can consider it covered by the "debuginfod should be given
trustworthy binaries" general rule.

> > +Each listed PATH also creates a thread to scan for ELF/DWARF/source
> > +files contained in matching RPMs under the given directory.  Duplicate
> > +directories are ignored.  You may use a file name for a PATH, but
> > +source code indexing may be incomplete; prefer using a directory that
> > +contains normal RPMs alongside debuginfo/debugsource RPMs.  Because of
> > +complications such as DWZ-compressed debuginfo, may require \fItwo\fP
> > +scan passes to identify all source code.
> 
> Again, this is convenient. But wouldn't it make sense to just have
> explicit paths for things like debuginfo/sources RPM containers?

Not really - these are intermingled as early as rpmbuild, and are kept
nearby in distro systems such as koji.

> > +If no PATH is listed, then \fBdebuginfod\fP will simply serve content
> > +that it scanned into its index in previous runs.
> 
> Please add a sentence explaining how to reset the index.

OK.  ("remove the database file").

> Also this means that if you change PATHs the contents from the old
> PATHS is still accessible? If so, do mention this explicitily.

OK, the doc says:

    If no PATH is listed, then debuginfod will simply serve content that it
    scanned into its index in previous runs: the data is cumulative.

> > +File names must match extended regular expressions given by the \-I
> > +option and not the \-X option (if any) in order to be considered.
> 
> These are global and cannot be set per PATH?

Yes.  It wouldn't be hard to make the code do this, but command line
parsing would have to be more stateful, as these options would have to
be repeated before? after? PATHs, etc.

> > +.SH OPTIONS
> > +
> > +.TP
> > +.B "\-d FILE" "\-\-database=FILE"
> > +Set the path of the SQLITE database used to store the index.  This
> > +file is disposable in the sense that a later rescan will repopulate
> > +data.  It will contain absolute file path names, so it may not be
> > +portable across machines.  It will be frequently read/written, so it
> > +may perform well while sharing across machines or users either, due
> > +to SQLITE locking performance.  The default database file is
> > +$HOME/.debuginfod.sqlite.
> 
> It performs well because it is frequently read/written?
> Is that missing a not?

Ugh, yes.

> > +.TP
> > +.B "\-D SQL" "\-\-ddl=SQL"
> > +Execute given SQLITE statement after the database is opened and
> > +initialized as extra DDL.  This may be useful to tune
> > +performance-related pragmas or indexes.  May be repeated.  The default
> > +is nothing extra.
> 
> You might want to explain what DDL stands for.

OK, SQL lingo.

> > +.B "\-I REGEX"  "\-\-include=REGEX"  "\-X REGEX"  "\-\-exclude=REGEX"
> > +Govern the inclusion and exclusion of file names under the search
> > +paths.  [...]
> 
> Could/Should this default to -I/all/paths/given/* so only files from
> under those paths are included and no files from outside the search
> paths?

fullpath(3) canonicalization (and in a later patch, symlink following
mode) would mess that up.  I know you're probably thinking about the
evil-source-file-escape problem, but I wouldn't handle that with an
index-time constraint like this.  I envisioned using this more for
optimization purposes, if for example one wants to index only .x86_64.
rpms in a tree that has other architectures.

> > +and means that only one initial scan should performed.  The default
> > +rescan time is 300 seconds.  Receiving a SIGUSR1 signal triggers a new
> > +scan, independent of the rescan time (including if it was zero).
> 
> 300 seconds seem somewhat arbitrary. But I don't have a better number
> :)

I was thinkinng it's about right for a developer's live build tree.

> > +.TP
> > +.B "\-g SECONDS" "\-\-groom\-time=SECONDS"
> > +Set the groom time for the index database.  This is the amount of time
> > +the grooming thread will wait after finishing a grooming pass before
> > +doing it again.  A groom operation quickly rescans all previously
> > +scanned files, only to see if they are still present and current, so
> > +it can deindex obsolete files.  See also the \fIDATA MANAGEMENT\fP
> > +section.  The default groom time is 86400 seconds (1 day).  A time of
> > +zero is acceptable, and means that only one initial groom should be
> > +performed.  Receiving a SIGUSR2 signal triggers a new grooming pass,
> > +independent of the groom time (including if it was zero).
> 
> So the rescan only adds new files to the index and doesn't purge old
> ones?

Yes, a normal scan only adds, as explained above and in the DATA
MANAGEMENT section below. :-)

> > +.TP
> > +.B "\-G"
> > +Run an extraordinary maximal-grooming pass at debuginfod startup.
> > +This pass can take considerable time, because it tries to remove any
> > +debuginfo-unrelated content from the RPM-related parts of the index.
> > +It should not be run if any recent RPM-related indexing operations
> > +were aborted early.  It can take considerable space, because it
> > +finishes up with an sqlite "vacuum" operation, which repacks the
> > +database file by triplicating it temporarily.  The default is not to
> > +do maximal-grooming.  See also the \fIDATA MANAGEMENT\fP section.
> 
> So basically never, ever use this? :)
> Can you add a hint when you should use it?

See also the DATA MANAGEMENT section. :-)

> > [...]  Relative path names commonly appear in the DWARF
> > +file's source directory, but these paths are relative to
> > +individual compilation unit AT_comp_dir paths, and yet an executable
> > +is made up of multiple CUs.  Therefore, to disambiguate, debuginfod
> > +expects source queries to prefix relative path names with the CU
> > +compilation-directory.
> 
> Note a subtlety with DWARF4 vs DWARF5. In both cases debug line index
> zero refers to the compile unit directory. But in DWARF5 that directory
> should actually be in the line table (so you don't have to look it up
> through the associated CU in the debug_info DIE tree).

Right, but do you think this subtlety should be elaborated to
debuginfod clients or administrators?  As far as a debugger tool is
concerned, the compilation directory is the compilation directory, no
matter which line table and which slot it is in.  Can you suggest a
wording tweak?

> I agree this is needed because it is how the data/paths are represented
> in DWARF. But it does concern me there is no /../ processing done to
> make sure something isn't requested outside the source paths.

I love your security mindset.  In this case, the /../ stuff doesn't
matter because the resulting string is only used as search key in the
database, not used verbatim against the filesystem.

> > +You should ensure that ample disk space remains available.  (The flood
> > +of error messages on -ENOSPC is ugly and nagging.  But, like for most
> > +other errors, debuginfod will resume when resources permit.)  If
> > +necessary, debuginfod can be stopped, the database file moved or
> > +removed, and debuginfod restarted.
> 
> Do we need and/or can we add some option to limit the disk space used?

We can't really limit it.  The groom cycle messages include a
database-size report field, so ... I suppose we could quit when the
database gets large enough, but that wouldn't help anyone.  Or could
stop indexing early ... but really the Proper solution is to rely on a
size-quota'd cachedir which is harmless if it hits ENOSPC, if it is
brought to the sysadmin's attention by our errors and by system
monitoring tools.

> Same question as with the client. Does it use the standard trust
> database for certificate verification with HTTPS?

The text says that debuginfod does not provide https service at all,
so this is not relevant.  A front-end https reverse-proxy would have
to deal with this stuff.  I plan to cover this in a blog post later
on, and probably in the form of software baked into a container image.

> > +.B DEBUGINFOD_URLS
> > +.B DEBUGINFOD_TIMEOUT
> > +.B DEBUGINFOD_CACHE_PATH
> 
> In this case that is downloaded files from other, remote, debuginfod
> servers?

Yes.

> Thanks so much for writing so much documentation!

My pleasure, really!

> I haven't had time to read all the code yet. But I already feel pretty
> good about it because the documentation is so extensive.

Thanks.

- FChE