Bug 26839 - Systemtap build failures with clang
Summary: Systemtap build failures with clang
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: server (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-04 08:27 UTC by Timm Bäder
Modified: 2021-05-19 21:46 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Timm Bäder 2020-11-04 08:27:02 UTC
systemtap currently fails to build with clang.

$ clang --version
clang version 10.0.1 (Fedora 10.0.1-3.fc32)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin



The output is (I tried deduplicating):

./elaborate.h:47:1: error: 'symresolution_info' defined as a struct here but previously declared as a class; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]
struct symresolution_info: public traversing_visitor
^
./session.h:126:1: note: did you mean struct here?
class symresolution_info;
^~~~~
struct


./elaborate.h:206:16: error: 'derived_probe::printsig' hides overloaded virtual function [-Werror,-Woverloaded-virtual]
  virtual void printsig (std::ostream &o, bool nest=true) const;
               ^
./staptree.h:913:16: note: hidden overloaded virtual function 'probe::printsig' declared here: different number of parameters (1 vs 2)
  virtual void printsig (std::ostream &o) const;
               ^
2 errors generated.


./dwflpp.h:185:1: error: struct 'location_context' was previously declared as a class; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]
struct location_context;
^
./loc2stap.h:45:7: note: previous use is here
class location_context
      ^
./dwflpp.h:185:1: note: did you mean class here?
struct location_context;
^~~~~~
class


./dwflpp.h:187:1: error: 'dwflpp' defined as a struct here but previously declared as a class; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]
struct dwflpp
^
./loc2stap.h:44:1: note: did you mean struct here?
class dwflpp;
^~~~~
struct


translate.cxx:303:8: error: 'emit_function' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
  void emit_function (functiondecl* fd);
       ^
translate.cxx:155:8: note: overridden virtual function is here
  void emit_function (functiondecl* v);
       ^
translate.cxx:304:8: error: 'emit_probe' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
  void emit_probe (derived_probe* dp);
       ^
translate.cxx:161:8: note: overridden virtual function is here
  void emit_probe (derived_probe* v);
       ^


util.cxx:1545:28: error: taking the absolute value of unsigned type 'unsigned long' has no effect [-Werror,-Wabsolute-value]
      unsigned min_score = labs(target.size() - it->size());
                           ^
util.cxx:1545:28: note: remove the call to 'labs' since unsigned values cannot be negative
      unsigned min_score = labs(target.size() - it->size());
                           ^~~~



Fixing the class/struct mixups is easy and shouldn't cause any problems.
I'm not so sure about the others though. How do you handle the missing override or the hidden overloaded virtual function in this codebase? I can work on this if it's easier for you.


Thanks
Comment 1 Frank Ch. Eigler 2020-11-05 01:11:56 UTC
Timm, thanks for the report.
We'd appreciate your help in fixing these c++ hygiene issues.
Regarding class/struct, sure please go ahead.

Regarding overrides / hidden overloads ... um, news to me. :-)
We may have some dead code, will look into it further.

Regarding labs() in util.cxx, we intended to use signed values,
so abs() would make sense.
Comment 2 Frank Ch. Eigler 2020-11-05 01:23:33 UTC
Regarding overrides, I believe we're just seeing a side-effect of the partial gradual c++11-ization of the code base.  We haven't added the cxx_override/override specifier systematically, and it appears g++ hasn't been insisting either.  I believe that in just about every case, we mean to override rather than hide member functions.
Comment 3 Timm Bäder 2020-11-05 09:21:37 UTC
Thanks for the reply, I'll work on this.

I changed the labs() line to:

      unsigned min_score = abs(static_cast<signed>(target.size()) - static_cast<signed>(it->size()));

since using the unsigned values generates another error:

util.cxx:1545:28: error: call to 'abs' is ambiguous
      unsigned min_score = abs(target.size() - it->size());
                           ^~~
/usr/include/stdlib.h:840:12: note: candidate function
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
           ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:56:3: note: candidate function
  abs(long __i) { return __builtin_labs(__i); }
  ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:61:3: note: candidate function
  abs(long long __x) { return __builtin_llabs (__x); }
  ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:71:3: note: candidate function
  abs(double __x)
  ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:75:3: note: candidate function
  abs(float __x)
  ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:79:3: note: candidate function
  abs(long double __x)
  ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:85:3: note: candidate function
  abs(__GLIBCXX_TYPE_INT_N_0 __x) { return __x >= 0 ? __x : -__x; }
  ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_abs.h:103:3: note: candidate function
  abs(__float128 __x)
  ^




After adding the 'override' to emit_function() and emit_probe() in translate.cxx as well as collect_derivation_chain() and collect_derivation_pp_chain() in elaborate.h, I run into a few other issues:

1) set2 in bpf-bitset.h does not return anything from its assignment operator but specifies a set2& return value. I fixed this with a 'return *this', but maybe this is just dead code and should be removed?

2) set1_ref and set1_const_ref have a user-defined copy-assignment operator but use the implicitly defined copy constructor, which clang warns about:

    ./bpf-bitset.h:108:19: error: definition of implicit copy constructor for 'set1_const_ref' is deprecated because it has a user-declared copy assignment operator [-Werror,-Wdeprecated-copy]
      set1_const_ref& operator= (const set1_const_ref &);   // not present
                      ^
    ./bpf-bitset.h:256:12: note: in implicit copy constructor for 'bpf::bitset::set1_const_ref' first required here
        return set1_const_ref(data + w2 * i, w2);

Solved this by just defining those missing copy constructors explicitly, e.g.:

+  set1_const_ref(const set1_const_ref &o) : data(o.data), words(o.words) { }



After this, I'm looking at clang and gcc handling -Wformat-nonliteral differently. Clang warns about this but gcc does not:

common.c:691:20: error: format string is not a string literal [-Werror,-Wformat-nonliteral]
                vsyslog(LOG_ERR, fmt, va);
                                 ^~~
common.c:693:20: error: format string is not a string literal [-Werror,-Wformat-nonliteral]
                vfprintf(stderr, fmt, va);
                                 ^~~

And another time in stapbpf.cxx:

stapbpf.cxx:257:20: error: format string is not a string literal [-Werror,-Wformat-nonliteral]
  vfprintf(stderr, str, va);


Not quite sure how to best handle these. Adding a

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wformat-nonliteral"
// Code
#pragma GCC diagnostic pop

block works for clang as well and is also done in e.g. stapbpf/bpfinterp.cxx for other code. What do you think?


Other than that, the only problem left is the printsig() stuff from above.
Comment 4 Tom Stellard 2020-11-23 19:12:39 UTC
How far did you get on these fixes?  Do you have any patches to share?
Comment 5 Timm Bäder 2020-11-24 07:52:37 UTC
I have four patches locally. I think the only problems remaining are the derived_probe::printsig() and the -Wformat-literal one. Those are up for the maintainers to decide I believe.

There are additional problems in the test suite, but I haven't looked at those in detail yet.
Comment 6 Timm Bäder 2021-01-18 12:06:12 UTC
Adding the GCC diagnostic pragmas to the printf-like functions was wrong btw and just adding the appropriate __attribute__((format(printf, x, y))) attributes is better, fixes the problems with clang and gives us better error reporting.

It also shows a few problems with the current un-annotated functions like this one in staprun/relay.c:

--- a/staprun/relay.c
+++ b/staprun/relay.c
@@ -272,7 +272,7 @@ static void switchfile_handler(int sig)
                pthread_mutex_lock(&mutex[avail_cpus[i]]);
                if (reader[avail_cpus[i]] && switch_file[avail_cpus[i]]) {
                        pthread_mutex_unlock(&mutex[avail_cpus[i]]);
-                       dbug(2, "file switching is progressing, signal ignored.\n", sig);
+                       dbug(2, "file switching is progressing, signal %d ignored.\n", sig);
                        return;
                }
                pthread_mutex_unlock(&mutex[avail_cpus[i]]);
diff --git a/staprun/staprun.h b/staprun/staprun.h


Frank, do you have an opinion about the printsig() issue? Or should I just post the patches I already have?


Thanks,
Timm
Comment 7 Tom Stellard 2021-01-18 15:10:27 UTC
Timm, I sent in your patches plus a few others back in November, but forgot to update the bug.  See https://sourceware.org/pipermail/systemtap/2020q4/027085.html
Comment 8 Timm Bäder 2021-05-11 12:00:44 UTC
Hey Frank do you have any update on this? We've been carrying the patches around for a while now and they need maintenance and start to collide with upstream changes, e.g. https://sourceware.org/git/?p=systemtap.git;a=commit;h=439fb4cc4c08166dccb85ba202a5762c4c46ba42

I think only the probe::printsig() problems need clarification, the other ones are straight forward.
Comment 9 Aaron Merey 2021-05-13 02:17:39 UTC
(In reply to Timm Bäder from comment #8)
> Hey Frank do you have any update on this? We've been carrying the patches
> around for a while now and they need maintenance and start to collide with
> upstream changes, e.g.
> https://sourceware.org/git/?p=systemtap.git;a=commit;
> h=439fb4cc4c08166dccb85ba202a5762c4c46ba42
> 
> I think only the probe::printsig() problems need clarification, the other
> ones are straight forward.

Hi Timm. When I merged that commit I didn't realise you had a similar patch pending. Regarding the printsig patch, I would suggest dropping the 'bool nest = true' definitions and instead call printsig_nested unconditionally.

Otherwise the patch set looks ok. With it, gcc still compiles systemtap without error and we are closer to being able to compile systemtap with clang. I still get an errors when compiling with clang however. ex. `unsupported argument 'auto' to option 'flto='` when compiling systemtap/python/HelperSDT/_HelperSDT.c
Comment 10 Timm Bäder 2021-05-14 07:57:58 UTC
(In reply to Aaron Merey from comment #9)
> (In reply to Timm Bäder from comment #8)
> > Hey Frank do you have any update on this? We've been carrying the patches
> > around for a while now and they need maintenance and start to collide with
> > upstream changes, e.g.
> > https://sourceware.org/git/?p=systemtap.git;a=commit;
> > h=439fb4cc4c08166dccb85ba202a5762c4c46ba42
> > 
> > I think only the probe::printsig() problems need clarification, the other
> > ones are straight forward.
> 
> Hi Timm. When I merged that commit I didn't realise you had a similar patch
> pending. Regarding the printsig patch, I would suggest dropping the 'bool
> nest = true' definitions and instead call printsig_nested unconditionally.

No problem, in the end it's one patch less we carry around. Dropping the nest bool is fine with me of course, I have a patch doing that. I can't seem to find where to send patches however, the mailing list seems to have remarkably few of them and the HACKING document doesn't seem to mention? Where does systemtap development happen?

> Otherwise the patch set looks ok. With it, gcc still compiles systemtap
> without error and we are closer to being able to compile systemtap with
> clang. I still get an errors when compiling with clang however. ex.
> `unsupported argument 'auto' to option 'flto='` when compiling
> systemtap/python/HelperSDT/_HelperSDT.c

I can't seem to find where -flto=auto could come from within the systemtap sources, is that in your env somehow? That shouldn't be used when compiling with clang. If you need a workaround, setting CCC_OVERRIDE_OPTIONS="x-flto=auto" should work.

Later versions of clang will also ignore -flto=auto: https://github.com/llvm/llvm-project/commit/1628486548420f85b3467026d54663d1516404f5
Comment 11 Aaron Merey 2021-05-19 21:46:07 UTC
Thanks for your patience Timm. I've merged your patch set, see the following commits:

14f04522b5
930b541193
6de815bcaa
0f4bd32197
545535f823
b3a3929751
8e5145ae4c

I also merged a patch to fix a couple more -Wformat-nonliteral and -Wmismatched-tags I ran into (commit 3e9bcd7b1fcf). 

With these patches and CCC_OVERRIDE_OPTIONS="x-flto=auto" I can successfully build systemtap with clang.