Bug 25548 - also support canonicalized source-file name lookups in webapi
Summary: also support canonicalized source-file name lookups in webapi
Status: RESOLVED FIXED
Alias: None
Product: elfutils
Classification: Unclassified
Component: debuginfod (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-13 15:02 UTC by Frank Ch. Eigler
Modified: 2020-03-26 14:44 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2020-02-13 15:02:20 UTC
It turns out that some debuginfo consumers canonicalize source-code file names by the time they get the convenient chance to fall back to debuginfod.  This means that path substrings that POSIX defines as no-ops are lost.  debuginfod should extend a gracious embrace to these tools by also permitting such paths to be used during the /buildid/hexcode/source/PATH webapi.

Implementing this should not be too hard.  The dwarf_extract_source_paths could add both waldo and a new canonicalize_posix_path(waldo) to the returned debug_sourcefiles[] set.  That's it.

The canonicalize_posix_path() function would perform these textual rewrites, repeating until convergence:

   /./       ->  /
   /FOO/../  ->  /    # NB: FOO must not be . or .. because   /../../ != /
   //        ->  /

(realpath(3) is not helpful because it looks at the host filesystem to do resolution, which we cannot when dealing with archive-resident file names.)
Comment 1 Jan Kratochvil 2020-02-13 15:19:03 UTC
// g++ -o x x.cpp -Wall $(llvm-config --cxxflags --libs)
#include <llvm/ADT/SmallString.h>
#include <llvm/Support/Path.h>
#include <stdio.h>
int main(int argc, char **argv) {
  while (*++argv) {
    llvm::SmallString<128> path(*argv);
    if (llvm::sys::path::remove_dots(path, true /*remove_dot_dot*/,
                                     llvm::sys::path::Style::posix))
      puts(path.c_str());
  }
}
Comment 2 Konrad Kleine 2020-02-13 17:20:30 UTC
Please hold back with implementing this because from what I heard from LLDB folks, the normalization of paths is not always sound. I wouldn't want to spread this if we're not 100% clear.
Comment 3 Pavel Labath 2020-02-14 12:21:07 UTC
The "soundness" that Konrad mentions in #c2 refers to the fact that "/FOO/../  ->  /" transformation is correct only in case "FOO" is not a symlink. If it is, then "FOO/.." can point to just about anywhere. The only way to know where it will _really_  point to is to fetch the original value of the symlink, but this may not be possible if one is dealing with paths which do not necessarily come from the host system.

So, I think that doing these kinds of transformations in these cases (e.g. when setting a breakpoint) is reasonable -- we don't need to a file to actually exist in order to set a breakpoint, and we don't want to force the user to type the exact path that happened to find its way into the debug info.

The part where this (IMO) becomes unsound is when lldb does these transformations on _all_ paths, even those that definitely do refer to the host filesystem (e.g. the file which we are about to execute).

Anyway, going back to this bug, I don't think it's unreasonable for debuginfod to support queries with "canonicalized" paths, but I can also understand a position which says that this canonicalization is bad, and that one should perform queries with the original path. Indeed, one can envision a world where lldb allows the user to set a breakpoint using the "canonicalized" path, but then performs the file lookups using the "original" path instead. These lookups would not even need to be limited to debuginfod -- using the original path for host filesystem lookups would also improve correctness in certain obscure scenarios. It's just that reaching this world in lldb will be tricky (but not impossible), because this canonicalization happens at a very low layer (but this is something I would want to change anyway).
Comment 4 Frank Ch. Eigler 2020-03-01 19:47:14 UTC
Related to this, Eli Schwartz helped identify two separate outright bugs in the code that touch on this problem.

The debuginfod-client.c curl api code neglects to pass CURLOPT_PATH_AS_IS, so source file references with /../ bits in them get collapsed.  This is the opposite of what the webapi docs say.

The above bug hid a separate bug in the server.  There, an inconsistent canonicalization rule is in effect with archives.  srefs are uncanonicalized, but sdefs are canonicalized.  That makes those source queries unsatisfiable.  Whoops.

The cure should consist of:

- adding CURLOPT_PATH_AS_IS to the client, where I swear it was at one point
- teaching the server to store both canonicalized AND uncanonicalized artifactsrc names in the _f_s and _r_sref tables (so up to two rows instead of just one)

and then bob will in fact be our uncle.
Comment 6 Frank Ch. Eigler 2020-03-26 14:44:46 UTC
commit d63a809da467e646480c273b8eb276401679d2bb