Bug 28641 - const'ness of member functions is not detected in clang++ binaries
Summary: const'ness of member functions is not detected in clang++ binaries
Status: NEW
Alias: None
Product: libabigail
Classification: Unclassified
Component: default (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Dodji Seketeli
URL:
Keywords:
Depends on:
Blocks: 27019
  Show dependency treegraph
 
Reported: 2021-12-01 23:44 UTC by Ben Woodard
Modified: 2021-12-09 11:32 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
clang version 13.0.0 (Fedora 13.0.0-3.fc35) built object (8.05 MB, application/x-gzip)
2021-12-01 23:46 UTC, Ben Woodard
Details
simple reproducer (79 bytes, text/plain)
2021-12-08 19:36 UTC, Ben Woodard
Details
attachment-364165-0.html (314 bytes, text/html)
2021-12-08 23:38 UTC, gprocida
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Woodard 2021-12-01 23:44:41 UTC
Build libabigail with both gcc and clang then compare the resulting library with abidiff

$ abidiff g5/lib/libabigail.so.0.0.0 l5/lib/libabigail.so.0.0.0 | wc -l
5080

The output is quite long however, there appear to be some low hanging fruit which can easily be resolved. They appear similar to other errors when doing abidw --abidiff and I expect they have a similar proximate cause.

A bunch of functions have completely unspecified subtype changes. This make tracking down the root cause of the problem exceedingly difficult and usually when their are unspecified subtype changes, it means that something in the comparison is different but it doesn't end up actually being an ABI artifact that matters.

Here are some examples:
  [C] 'method abigail::ir::class_decl::base_spec_sptr abigail::comparison::base_diff::first_base() const' at abg-comparison.h:1801:1 has some indirect sub-type changes:

  [C] 'method const abigail::comparison::class_diff_sptr abigail::comparison::base_diff::get_underlying_class_diff() const' at abg-comparison.h:1807:1 has some indirect sub-type changes:

  [C] 'method virtual bool abigail::comparison::base_diff::has_changes() const' at abg-comparison.h:1816:1 has some indirect sub-type changes:

  [C] 'method virtual abigail::ir::change_kind abigail::comparison::base_diff::has_local_changes() const' at abg-comparison.h:1819:1 has some indirect sub-type changes:

  [C] 'method virtual void abigail::comparison::base_diff::report(std::ostream&, const std::string&) const' at abg-comparison.h:1822:1 has some indirect sub-type changes:

  [C] 'method abigail::ir::class_decl::base_spec_sptr abigail::comparison::base_diff::second_base() const' at abg-comparison.h:1804:1 has some indirect sub-type changes:

This is was done with the latest libabigail trunk.
Comment 1 Ben Woodard 2021-12-01 23:46:38 UTC
Created attachment 13812 [details]
clang version 13.0.0 (Fedora 13.0.0-3.fc35) built object
Comment 2 Ben Woodard 2021-12-01 23:51:28 UTC
I was not able to attach the GCC built object due to size but I stuffed it here for easy access:

http://ssh.bencoyote.net/~ben/libabigail.so.0.0.0.gcc.bz2
Comment 3 Ben Woodard 2021-12-02 00:01:43 UTC
The request is to either specify what the subtype changes are or if they do not matter for ABI purposes, make the types evaluate as the same.
Comment 4 Ben Woodard 2021-12-06 18:59:24 UTC
The only meaningful difference that I can see in one case is "const"

excerpt from the ABIXMLs:

GCC 

          <member-function access='public' const='yes' vtable-offset='6'>
            <function-decl name='has_changes' mangled-name='_ZNK7abigail10comparison10array_diff11has_changesEv'
                filepath='../../../../Work/libabigail/src/abg-comparison.cc'
                line='3578' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZNK7abigail10comparison10array_diff11has_changesEv'>
              <parameter type-id='type-id-1051' is-artificial='yes'/>
              <return type-id='type-id-7'/>
            </function-decl>
          </member-function>

clang:
          <member-function access='public' vtable-offset='6'>
            <function-decl name='has_changes' mangled-name='_ZNK7abigail10comparison10array_diff11has_changesEv'
                filepath='/home/ben/Shared/test/libabigail-test/l5/../../../Work/libabigail/include/abg-comparison.h'
                line='1374' column='1' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='_ZNK7abigail10comparison10array_diff11has_changesEv'>
              <parameter type-id='type-id-828' is-artificial='yes'/>
              <return type-id='type-id-2'/>
            </function-decl>
          </member-function>

Here is another example:

gcc:
          <member-function access='public' const='yes' vtable-offset='7'>
            <function-decl name='has_local_changes' mangled-name='_ZNK7abigail10comparison14type_diff_base17has_local_changesEv'
                filepath='/home/ben/Shared/test/libabigail-test/g5/../../../Work/libabigail/include/abg-comparison.h' line='1080' column='1' visibility='default' binding='global' size-in-bits='64'>
              <parameter type-id='type-id-1232' is-artificial='yes'/>
              <return type-id='type-id-8859'/>
            </function-decl>
          </member-function>

clang:
          <member-function access='public' vtable-offset='7'>
            <function-decl name='has_local_changes' mangled-name='_ZNK7abigail10comparison14type_diff_base17has_local_changesEv'
                filepath='/home/ben/Shared/test/libabigail-test/l5/../../../Work/libabigail/include/abg-comparison.h' line='1080' column='1' visibility='default' binding='global' size-in-bits='64'>
              <parameter type-id='type-id-1004' is-artificial='yes'/>
              <return type-id='type-id-8200'/>
            </function-decl>
          </member-function>

So arguably this is a clang bug in that it doesn't seem to record the const-ness of the function in the DWARF (or we have a parsing bug where we don't pick it up -- haven't checked that yet). 

I believe that we had a discussion a few years ago about whether const-ness of a function actually amounted to an ABI difference. I can't recall the exact ultimate decision on that point.

At the very least, when the difference between the two functions is "const" then libabigail should print that out rather than emitting nothing.
Comment 5 Jonathan Wakely 2021-12-08 18:52:38 UTC
(In reply to Ben Woodard from comment #4)
> I believe that we had a discussion a few years ago about whether const-ness
> of a function actually amounted to an ABI difference. I can't recall the
> exact ultimate decision on that point.

It absolutely is an ABI property.

> At the very least, when the difference between the two functions is "const"
> then libabigail should print that out rather than emitting nothing.

That case shouldn't really be possible. If one of them isn't const, then the mangled name would change, and it would be a completely different symbol, and you'd never compare them in the first place.

So writing code to handle the case where a const function is not const *should* be unnecessary.
Comment 6 Ben Woodard 2021-12-08 18:54:32 UTC
The functions that I pointed out in the previous comment are all _ZNK functions. You can overload on the const'ness of a function and so it has to be encoded in the mangling of the linkage name.

struct X { void foo(); void foo() const; };

We should try to get this fixed in clang so that it correctly encodes the const'ness of the function in the DWARF. However, we are going to need to deal with all the versions of clang which don't get this right. So I propose a fixup: if the function has the K attribute in its name mangling even if the function's DWARF is not marked const then the function should be labeled as const.

List of needed actions:
1) file a bug with clang
2) when the difference between two functions is const'ness print that out don't just leave people with "...has some indirect sub-type changes:" then print nothing.
3) if the function mangle name has the K in it, then mark it const in the IR.
Comment 7 Ben Woodard 2021-12-08 19:36:56 UTC
Created attachment 13832 [details]
simple reproducer
Comment 8 Ben Woodard 2021-12-08 20:27:10 UTC
Attached a very simple reproducer:

[ben@alien libabigail-test]$ cat llvm-constness.C
int a=0;

struct X {
  void foo();
  void foo() const;
};

void X::foo(){
  a++;
}

void X::foo() const {
  a--;
}
[ben@alien libabigail-test]$ glto/bin/abidiff llvm-constness.o.gcc llvm-constness.o.llvm 
Functions changes summary: 0 Removed, 1 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

1 function with some indirect sub-type change:

  [C] 'method void X::foo() const' at llvm-constness.C:5:1 has some indirect sub-type changes:

Here is the DWARF breakdown:

gcc:
 [    72]      subprogram           abbrev: 10
               external             (flag_present) yes
               name                 (string) "foo"
               decl_file            (data1) llvm-constness.C (1)
               decl_line            (data1) 5
               decl_column          (data1) 8
               linkage_name         (strp) "_ZNK1X3fooEv"
               declaration          (flag_present) yes
               object_pointer       (ref4) [    82]
 [    82]        formal_parameter     abbrev: 2
                 type                 (ref4) [    98]
                 artificial           (flag_present) yes
 [    98]    pointer_type         abbrev: 3
             byte_size            (implicit_const) 8
             type                 (ref4) [    89]
 [    89]    const_type           abbrev: 1
             type                 (ref4) [    4d]
 [    4d]    structure_type       abbrev: 8
             name                 (string) "X"
             byte_size            (data1) 1
             decl_file            (data1) llvm-constness.C (1)
             decl_line            (data1) 3
             decl_column          (data1) 8
             sibling              (ref4) [    89]

clang++:
 [    60]      subprogram           abbrev: 5
               linkage_name         (strp) "_ZNK1X3fooEv"
               name                 (strp) "foo"
               decl_file            (data1) llvm-constness.C (1)
               decl_line            (data1) 5
               declaration          (flag_present) yes
               external             (flag_present) yes
 [    6b]        formal_parameter     abbrev: 6
                 type                 (ref4) [    77]
                 artificial           (flag_present) yes
 [    77]    pointer_type         abbrev: 7
             type                 (ref4) [    7c]
 [    7c]    const_type           abbrev: 8
             type                 (ref4) [    46]
 [    46]    structure_type       abbrev: 4
             calling_convention   (data1) pass_by_value (5)
             name                 (strp) "X"
             byte_size            (data1) 1
             decl_file            (data1) llvm-constness.C (1)
             decl_line            (data1) 3

Which looks pretty much the same and basically correct. I think that the first time I looked at it I mistook the implicit_const on GCC's byte_size as the thing that was different so there may not be a clang bug just a libabigail bug where for some reason it doesn't pick up the fact that the this pointer is const which seems to be the thing that indicates that the member function is const.
Comment 9 gprocida 2021-12-08 23:38:58 UTC
Created attachment 13837 [details]
attachment-364165-0.html

This may already have been fixed in current Clang.

We noticed that we were getting conflicting duplicate type definitions in
ABI XML.

it turned out that Clang emits debug information for functions at some call
sites, but it was an incomplete type skeleton, missing things like
qualifiers.

This was the fix: https://reviews.llvm.org/rG85f612efeb35
Comment 10 gprocida 2021-12-09 11:18:09 UTC
It appears not to be that Clang bug.

On the other hand, I think there is some confusion going on here.

The difference is in the const-ness of the this pointer. It is *not* the const-ness of the function (i.e., the constness of *this).

Given that this itself cannot be modified, I'd say GCC is doing a better job here.

abidiff --harmless /tmp/llcc.o.clang /tmp/llcc.o.gcc 
Functions changes summary: 0 Removed, 2 Changed, 0 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

2 functions with some indirect sub-type change:

  [C] 'method void X::foo()' at llcc.cc:8:1 has some indirect sub-type changes:
    implicit parameter 0 of type 'X*' changed:
      entity changed from 'X*' to 'X* const'
      type size hasn't changed

  [C] 'method void X::foo()' at llcc.cc:12:1 has some indirect sub-type changes:
    implicit parameter 0 of type 'const X*' changed:
      entity changed from 'const X*' to 'const X* const'
      type size hasn't changed
Comment 11 Jonathan Wakely 2021-12-09 11:25:57 UTC
Abigail should just ignore that completely. The implicit parameter is not observable or accessible to user code, and there is no way to modify it whether or not it's "declared" as X* const or X*.

That implicit parameter is not the `this` pointer, it's the _value_ of the `this` pointer, but that's not the same thing.  `this` is a keyword that produces a prvalue of type "pointer to cv X" so it's always non-const (but unmodifiable, because it's a prvalue of fundamental type). The implicit parameter is what the compiler uses to provide the value of that prvalue, but is not actually the prvalue itself.
Comment 12 Jonathan Wakely 2021-12-09 11:32:45 UTC
(In reply to gprocida from comment #10)
> The difference is in the const-ness of the this pointer. It is *not* the
> const-ness of the function (i.e., the constness of *this).

If that's true, this is a very confusing way for abixml to express that:

gcc:
          <member-function access='public' const='yes' vtable-offset='7'>

clang:
          <member-function access='public' vtable-offset='7'>

If the "const" attribute here refers to an unobservable and irrelevant detail of the implicit object parameter, why is it an attribute of the <member-function> element, not its parameter?