systemtap currently fails to build with clang. $ clang --version clang version 10.0.1 (Fedora 10.0.1-3.fc32) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin The output is (I tried deduplicating): ./elaborate.h:47:1: error: 'symresolution_info' defined as a struct here but previously declared as a class; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags] struct symresolution_info: public traversing_visitor ^ ./session.h:126:1: note: did you mean struct here? class symresolution_info; ^~~~~ struct ./elaborate.h:206:16: error: 'derived_probe::printsig' hides overloaded virtual function [-Werror,-Woverloaded-virtual] virtual void printsig (std::ostream &o, bool nest=true) const; ^ ./staptree.h:913:16: note: hidden overloaded virtual function 'probe::printsig' declared here: different number of parameters (1 vs 2) virtual void printsig (std::ostream &o) const; ^ 2 errors generated. ./dwflpp.h:185:1: error: struct 'location_context' was previously declared as a class; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags] struct location_context; ^ ./loc2stap.h:45:7: note: previous use is here class location_context ^ ./dwflpp.h:185:1: note: did you mean class here? struct location_context; ^~~~~~ class ./dwflpp.h:187:1: error: 'dwflpp' defined as a struct here but previously declared as a class; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags] struct dwflpp ^ ./loc2stap.h:44:1: note: did you mean struct here? class dwflpp; ^~~~~ struct translate.cxx:303:8: error: 'emit_function' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] void emit_function (functiondecl* fd); ^ translate.cxx:155:8: note: overridden virtual function is here void emit_function (functiondecl* v); ^ translate.cxx:304:8: error: 'emit_probe' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] void emit_probe (derived_probe* dp); ^ translate.cxx:161:8: note: overridden virtual function is here void emit_probe (derived_probe* v); ^ util.cxx:1545:28: error: taking the absolute value of unsigned type 'unsigned long' has no effect [-Werror,-Wabsolute-value] unsigned min_score = labs(target.size() - it->size()); ^ util.cxx:1545:28: note: remove the call to 'labs' since unsigned values cannot be negative unsigned min_score = labs(target.size() - it->size()); ^~~~ Fixing the class/struct mixups is easy and shouldn't cause any problems. I'm not so sure about the others though. How do you handle the missing override or the hidden overloaded virtual function in this codebase? I can work on this if it's easier for you. Thanks
Timm, thanks for the report. We'd appreciate your help in fixing these c++ hygiene issues. Regarding class/struct, sure please go ahead. Regarding overrides / hidden overloads ... um, news to me. :-) We may have some dead code, will look into it further. Regarding labs() in util.cxx, we intended to use signed values, so abs() would make sense.
Regarding overrides, I believe we're just seeing a side-effect of the partial gradual c++11-ization of the code base. We haven't added the cxx_override/override specifier systematically, and it appears g++ hasn't been insisting either. I believe that in just about every case, we mean to override rather than hide member functions.
Thanks for the reply, I'll work on this. I changed the labs() line to: unsigned min_score = abs(static_cast<signed>(target.size()) - static_cast<signed>(it->size())); since using the unsigned values generates another error: util.cxx:1545:28: error: call to 'abs' is ambiguous unsigned min_score = abs(target.size() - it->size()); ^~~ /usr/include/stdlib.h:840:12: note: candidate function extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur; ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:56:3: note: candidate function abs(long __i) { return __builtin_labs(__i); } ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:61:3: note: candidate function abs(long long __x) { return __builtin_llabs (__x); } ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:71:3: note: candidate function abs(double __x) ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:75:3: note: candidate function abs(float __x) ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:79:3: note: candidate function abs(long double __x) ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:85:3: note: candidate function abs(__GLIBCXX_TYPE_INT_N_0 __x) { return __x >= 0 ? __x : -__x; } ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:103:3: note: candidate function abs(__float128 __x) ^ After adding the 'override' to emit_function() and emit_probe() in translate.cxx as well as collect_derivation_chain() and collect_derivation_pp_chain() in elaborate.h, I run into a few other issues: 1) set2 in bpf-bitset.h does not return anything from its assignment operator but specifies a set2& return value. I fixed this with a 'return *this', but maybe this is just dead code and should be removed? 2) set1_ref and set1_const_ref have a user-defined copy-assignment operator but use the implicitly defined copy constructor, which clang warns about: ./bpf-bitset.h:108:19: error: definition of implicit copy constructor for 'set1_const_ref' is deprecated because it has a user-declared copy assignment operator [-Werror,-Wdeprecated-copy] set1_const_ref& operator= (const set1_const_ref &); // not present ^ ./bpf-bitset.h:256:12: note: in implicit copy constructor for 'bpf::bitset::set1_const_ref' first required here return set1_const_ref(data + w2 * i, w2); Solved this by just defining those missing copy constructors explicitly, e.g.: + set1_const_ref(const set1_const_ref &o) : data(o.data), words(o.words) { } After this, I'm looking at clang and gcc handling -Wformat-nonliteral differently. Clang warns about this but gcc does not: common.c:691:20: error: format string is not a string literal [-Werror,-Wformat-nonliteral] vsyslog(LOG_ERR, fmt, va); ^~~ common.c:693:20: error: format string is not a string literal [-Werror,-Wformat-nonliteral] vfprintf(stderr, fmt, va); ^~~ And another time in stapbpf.cxx: stapbpf.cxx:257:20: error: format string is not a string literal [-Werror,-Wformat-nonliteral] vfprintf(stderr, str, va); Not quite sure how to best handle these. Adding a #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wformat-nonliteral" // Code #pragma GCC diagnostic pop block works for clang as well and is also done in e.g. stapbpf/bpfinterp.cxx for other code. What do you think? Other than that, the only problem left is the printsig() stuff from above.
How far did you get on these fixes? Do you have any patches to share?
I have four patches locally. I think the only problems remaining are the derived_probe::printsig() and the -Wformat-literal one. Those are up for the maintainers to decide I believe. There are additional problems in the test suite, but I haven't looked at those in detail yet.
Adding the GCC diagnostic pragmas to the printf-like functions was wrong btw and just adding the appropriate __attribute__((format(printf, x, y))) attributes is better, fixes the problems with clang and gives us better error reporting. It also shows a few problems with the current un-annotated functions like this one in staprun/relay.c: --- a/staprun/relay.c +++ b/staprun/relay.c @@ -272,7 +272,7 @@ static void switchfile_handler(int sig) pthread_mutex_lock(&mutex[avail_cpus[i]]); if (reader[avail_cpus[i]] && switch_file[avail_cpus[i]]) { pthread_mutex_unlock(&mutex[avail_cpus[i]]); - dbug(2, "file switching is progressing, signal ignored.\n", sig); + dbug(2, "file switching is progressing, signal %d ignored.\n", sig); return; } pthread_mutex_unlock(&mutex[avail_cpus[i]]); diff --git a/staprun/staprun.h b/staprun/staprun.h Frank, do you have an opinion about the printsig() issue? Or should I just post the patches I already have? Thanks, Timm
Timm, I sent in your patches plus a few others back in November, but forgot to update the bug. See https://sourceware.org/pipermail/systemtap/2020q4/027085.html
Hey Frank do you have any update on this? We've been carrying the patches around for a while now and they need maintenance and start to collide with upstream changes, e.g. https://sourceware.org/git/?p=systemtap.git;a=commit;h=439fb4cc4c08166dccb85ba202a5762c4c46ba42 I think only the probe::printsig() problems need clarification, the other ones are straight forward.
(In reply to Timm Bäder from comment #8) > Hey Frank do you have any update on this? We've been carrying the patches > around for a while now and they need maintenance and start to collide with > upstream changes, e.g. > https://sourceware.org/git/?p=systemtap.git;a=commit; > h=439fb4cc4c08166dccb85ba202a5762c4c46ba42 > > I think only the probe::printsig() problems need clarification, the other > ones are straight forward. Hi Timm. When I merged that commit I didn't realise you had a similar patch pending. Regarding the printsig patch, I would suggest dropping the 'bool nest = true' definitions and instead call printsig_nested unconditionally. Otherwise the patch set looks ok. With it, gcc still compiles systemtap without error and we are closer to being able to compile systemtap with clang. I still get an errors when compiling with clang however. ex. `unsupported argument 'auto' to option 'flto='` when compiling systemtap/python/HelperSDT/_HelperSDT.c
(In reply to Aaron Merey from comment #9) > (In reply to Timm Bäder from comment #8) > > Hey Frank do you have any update on this? We've been carrying the patches > > around for a while now and they need maintenance and start to collide with > > upstream changes, e.g. > > https://sourceware.org/git/?p=systemtap.git;a=commit; > > h=439fb4cc4c08166dccb85ba202a5762c4c46ba42 > > > > I think only the probe::printsig() problems need clarification, the other > > ones are straight forward. > > Hi Timm. When I merged that commit I didn't realise you had a similar patch > pending. Regarding the printsig patch, I would suggest dropping the 'bool > nest = true' definitions and instead call printsig_nested unconditionally. No problem, in the end it's one patch less we carry around. Dropping the nest bool is fine with me of course, I have a patch doing that. I can't seem to find where to send patches however, the mailing list seems to have remarkably few of them and the HACKING document doesn't seem to mention? Where does systemtap development happen? > Otherwise the patch set looks ok. With it, gcc still compiles systemtap > without error and we are closer to being able to compile systemtap with > clang. I still get an errors when compiling with clang however. ex. > `unsupported argument 'auto' to option 'flto='` when compiling > systemtap/python/HelperSDT/_HelperSDT.c I can't seem to find where -flto=auto could come from within the systemtap sources, is that in your env somehow? That shouldn't be used when compiling with clang. If you need a workaround, setting CCC_OVERRIDE_OPTIONS="x-flto=auto" should work. Later versions of clang will also ignore -flto=auto: https://github.com/llvm/llvm-project/commit/1628486548420f85b3467026d54663d1516404f5
Thanks for your patience Timm. I've merged your patch set, see the following commits: 14f04522b5 930b541193 6de815bcaa 0f4bd32197 545535f823 b3a3929751 8e5145ae4c I also merged a patch to fix a couple more -Wformat-nonliteral and -Wmismatched-tags I ran into (commit 3e9bcd7b1fcf). With these patches and CCC_OVERRIDE_OPTIONS="x-flto=auto" I can successfully build systemtap with clang.