It turns out that some debuginfo consumers canonicalize source-code file names by the time they get the convenient chance to fall back to debuginfod. This means that path substrings that POSIX defines as no-ops are lost. debuginfod should extend a gracious embrace to these tools by also permitting such paths to be used during the /buildid/hexcode/source/PATH webapi. Implementing this should not be too hard. The dwarf_extract_source_paths could add both waldo and a new canonicalize_posix_path(waldo) to the returned debug_sourcefiles[] set. That's it. The canonicalize_posix_path() function would perform these textual rewrites, repeating until convergence: /./ -> / /FOO/../ -> / # NB: FOO must not be . or .. because /../../ != / // -> / (realpath(3) is not helpful because it looks at the host filesystem to do resolution, which we cannot when dealing with archive-resident file names.)
// g++ -o x x.cpp -Wall $(llvm-config --cxxflags --libs) #include <llvm/ADT/SmallString.h> #include <llvm/Support/Path.h> #include <stdio.h> int main(int argc, char **argv) { while (*++argv) { llvm::SmallString<128> path(*argv); if (llvm::sys::path::remove_dots(path, true /*remove_dot_dot*/, llvm::sys::path::Style::posix)) puts(path.c_str()); } }
Please hold back with implementing this because from what I heard from LLDB folks, the normalization of paths is not always sound. I wouldn't want to spread this if we're not 100% clear.
The "soundness" that Konrad mentions in #c2 refers to the fact that "/FOO/../ -> /" transformation is correct only in case "FOO" is not a symlink. If it is, then "FOO/.." can point to just about anywhere. The only way to know where it will _really_ point to is to fetch the original value of the symlink, but this may not be possible if one is dealing with paths which do not necessarily come from the host system. So, I think that doing these kinds of transformations in these cases (e.g. when setting a breakpoint) is reasonable -- we don't need to a file to actually exist in order to set a breakpoint, and we don't want to force the user to type the exact path that happened to find its way into the debug info. The part where this (IMO) becomes unsound is when lldb does these transformations on _all_ paths, even those that definitely do refer to the host filesystem (e.g. the file which we are about to execute). Anyway, going back to this bug, I don't think it's unreasonable for debuginfod to support queries with "canonicalized" paths, but I can also understand a position which says that this canonicalization is bad, and that one should perform queries with the original path. Indeed, one can envision a world where lldb allows the user to set a breakpoint using the "canonicalized" path, but then performs the file lookups using the "original" path instead. These lookups would not even need to be limited to debuginfod -- using the original path for host filesystem lookups would also improve correctness in certain obscure scenarios. It's just that reaching this world in lldb will be tricky (but not impossible), because this canonicalization happens at a very low layer (but this is something I would want to change anyway).
Related to this, Eli Schwartz helped identify two separate outright bugs in the code that touch on this problem. The debuginfod-client.c curl api code neglects to pass CURLOPT_PATH_AS_IS, so source file references with /../ bits in them get collapsed. This is the opposite of what the webapi docs say. The above bug hid a separate bug in the server. There, an inconsistent canonicalization rule is in effect with archives. srefs are uncanonicalized, but sdefs are canonicalized. That makes those source queries unsatisfiable. Whoops. The cure should consist of: - adding CURLOPT_PATH_AS_IS to the client, where I swear it was at one point - teaching the server to store both canonicalized AND uncanonicalized artifactsrc names in the _f_s and _r_sref tables (so up to two rows instead of just one) and then bob will in fact be our uncle.
https://sourceware.org/pipermail/elfutils-devel/2020q1/002522.html
commit d63a809da467e646480c273b8eb276401679d2bb