Bug 31017 - Flex array conversion suppression
Summary: Flex array conversion suppression
Status: RESOLVED FIXED
Alias: None
Product: libabigail
Classification: Unclassified
Component: default (show other bugs)
Version: unspecified
: P2 enhancement
Target Milestone: ---
Assignee: Dodji Seketeli
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-31 18:34 UTC by John Moon
Modified: 2023-11-15 11:48 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2023-11-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Moon 2023-10-31 18:34:13 UTC
In the past, it was common in Linux kernel code to have a "fake flex array" at the end of a structure. Like this:

struct foo {
         int x;
         int y;
         int end[1];
}; 

In recent years, with improved compiler support, real flex arrays have been preferred. It's common to see patches like this: https://github.com/torvalds/linux/commit/c6f2e6b6eaaf883df482cb94f302acad9b80a2a4

Basically, this takes the struct above, and changes it to:

struct foo {
         int x;
         int y;
         int end[];
}; 

abidiff flags this change with:

   [C] 'struct foo' changed:
     type size changed from 96 to 64 (in bits)
     1 data member change:
       type of 'int end[1]' changed:
         type name changed from 'int[1]' to 'int[]'
         array type size changed from 32 to 'unknown'
         array type subrange 1 changed length from 1 to 'unknown' 

It would be good to have a suppression to filter out this kind of change. For example:

     [suppress_type]
       type_kind = struct
       has_size_change = true
       has_strict_flexible_array_data_member_conversion = true
Comment 1 Dodji Seketeli 2023-11-03 20:58:55 UTC
I have started a patch in the branch https://sourceware.org/git/?p=libabigail.git;a=shortlog;h=refs/heads/users/dodji/PR31017 to support this.

It's still dry at the moment, but I believe it should support the example provided in this feature request.  I'll be polishing it and prepare it for submission. Please feel free to try it and tell me what rough edges you think I should polish more.
Comment 2 John Moon 2023-11-04 00:49:42 UTC
Thanks for the implementation! I think this looks great. I tested it and it seems to be working properly for our use case in the kernel!

One question about this block:

+      // Support for the
+      // "has_strict_flexible_array_data_member_conversion = true"
+      // clause.
+      if (has_strict_fam_conversion())
+       {
+         // Let's detect if the first class of the diff has a fake
+         // flexible array data member that got turned into a real
+         // flexible array data member.
+         if (!(
+               (has_fake_flexible_array_data_member(first_class)
+                && has_flexible_array_data_member(second_class))
+               // A fake flexible array member has been changed into
+               // a real flexible array ...
+               &&
+               ((first_class->get_size_in_bits()
+                 == second_class->get_size_in_bits())
+                || get_has_size_change())
+               // There was no size change or the suppression has a
+               // "has_size_change = true" clause.
+               ))
+           return false;
+       }

Is it possible for a structure to meet the first condition (fake flex -> flex) *without* a size change? I'd think not, but may be missing something.

Basically, I think you can get rid of the first_class->get_size_in_bits() == second_class->get_size_in_bits() check.

Also, if we know a size change is a tautology, you could move the get_has_size_change() check to be first and save a few CPU cycles.

Other than that, LGTM!

I went ahead and made that change and added (at least a start) on the documentation/tests. I don't have access to make a branch, so I just sent a patch separately. We can continue discussion there as needed.
Comment 3 Dodji Seketeli 2023-11-05 09:43:16 UTC
"quic_johmoo at quicinc dot com" <sourceware-bugzilla@sourceware.org> a
écrit:

> https://sourceware.org/bugzilla/show_bug.cgi?id=31017
>
> --- Comment #2 from John Moon <quic_johmoo at quicinc dot com> ---
>
> Thanks for the implementation! I think this looks great. I tested it and it
> seems to be working properly for our use case in the kernel!
>
> One question about this block:
>
> +      // Support for the
> +      // "has_strict_flexible_array_data_member_conversion = true"
> +      // clause.
> +      if (has_strict_fam_conversion())
> +       {
> +         // Let's detect if the first class of the diff has a fake
> +         // flexible array data member that got turned into a real
> +         // flexible array data member.
> +         if (!(
> +               (has_fake_flexible_array_data_member(first_class)
> +                && has_flexible_array_data_member(second_class))
> +               // A fake flexible array member has been changed into
> +               // a real flexible array ...
> +               &&
> +               ((first_class->get_size_in_bits()
> +                 == second_class->get_size_in_bits())
> +                || get_has_size_change())
> +               // There was no size change or the suppression has a
> +               // "has_size_change = true" clause.
> +               ))
> +           return false;
> +       }
>
> Is it possible for a structure to meet the first condition (fake flex -> flex)
> *without* a size change?

As a general rule, suppression specifications (aka supprspecs) don't apply to a type
which size has changed, unless the user /really/ wants the supprspec to
apply.  If she really wants it, then she has to explicitly say
"has_size_change = yes".  This is prevents "too eager" supprspecs to be
applied without the user noticing the supprspecs is too eager.

Basically, if a type's size changed, more often than not, we don't want
to suppress its change report, for obvious reasons.

That is why, throughout type_suppression::suppresses_diff you see the
careful attention to the size change condition, when evaluating
supprspecs.

In this particular case, I think that we can have "fake flex -> flex"
changes without a size change because there can other /additional/
changes that counter the size change we would have expected.  That
change could have been introduced, on purpose, to keep the ABI stable.
For instance:

    $ diff -u test-PR31017-2-v0.c test-PR31017-2-v1.c 
    --- test-PR31017-2-v0.c	2023-11-05 10:04:36.433000539 +0100
    +++ test-PR31017-2-v1.c	2023-11-05 10:07:00.579810832 +0100
    @@ -2,7 +2,8 @@
     {
       int x;
       int y;
    -  int end[1];
    +  int padding;
    +  int end[];
     };

    $ /home/dodji/git/libabigail/PR31017/build/tools/abidiff test-PR31017-2-v0.o test-PR31017-2-v1.o 
    Functions changes summary: 0 Removed, 1 Changed, 0 Added function
    Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

    1 function with some indirect sub-type change:

      [C] 'function void fun(foo*)' at test-PR31017-2-v0.c:9:1 has some indirect sub-type changes:
        parameter 1 of type 'foo*' has sub-type changes:
          in pointed to type 'struct foo' at test-PR31017-2-v1.c:1:1:
            type size hasn't changed
            1 data member insertion:
              'int padding', at offset 64 (in bits) at test-PR31017-2-v1.c:5:1
            1 data member change:
              type of 'int end[1]' changed:
                type name changed from 'int[1]' to 'int[]'
                array type size changed from 32 to 'unknown'
                array type subrange 1 changed length from 1 to 'unknown'
              and offset changed from 64 to 96 (in bits) (by +32 bits)

    $ 

One reason why I think it's important to keep this "rigour" with the
type size change thing is that abidiff actually returns a code that is a
bit field that tells callers about the categories of the changes it
encountered.  From
https://sourceware.org/libabigail/manual/abidiff.html#return-values, we
see that if the ABIDIFF_ABI_CHANGE bit is set, it means there were some
ABI changes.  But then if the ABIDIFF_ABI_INCOMPATIBLE_CHANGE bit is
set, it means abidiff is 100% sure that at least of the changes causes
an ABI incompatibility.  In a continuous integration context, for
instance, if the ABIDIFF_ABI_INCOMPATIBLE_CHANGE bit is set, it means we
are sure the change is incompatible, whereas if only the
ABIDIFF_ABI_CHANGE bit is set, it means the change might or might not
incompatible and thus needs a human review to decide.

To wraps this all up, I'd say that only changes that would NOT set the
ABIDIFF_ABI_INCOMPATIBLE_CHANGE bit should be able to be suppressed,
unless the user really knows what she is doing.

Thinking about this, maybe check-uapi.sh could use the return code of
abidiff rather than grepping its output.  check-uapi.sh would then only
reject changes categorically only if the ABIDIFF_ABI_INCOMPATIBLE_CHANGE
bit is set.  If only ABIDIFF_ABI_CHANGE bit is set, check-uapi.sh would
just propose the user to review the changes detected and possibly waive
them.  One result of the waiving process would thus be a new supprspec
written and added to the stock of supprspecs to make sure the same kind
of reviews is not requested in the future.

With time, if a supprspec is recognized to be needed for this particular
project, libabigail can even integrate it and install it by default for
that project to use.  We do this for various projects and their default
supprspecs are included in the default.abignore file that is installed
libabigail.  You can browse it at
https://sourceware.org/git/?p=libabigail.git;a=blob;f=default.abignore
and learn about what project requested default supprspecs.

> I'd think not, but may be missing something.

I hope my explanation above helps shed some light in this apparently
weird way of doing things.

> Basically, I think you can get rid of the first_class->get_size_in_bits() ==
> second_class->get_size_in_bits() check.

I would rather keep it, at least for the sake of consistency in the
behaviour supprspecs evaluation in general, especially with the
unwritten rule:

    "only changes that would NOT set the ABIDIFF_ABI_INCOMPATIBLE_CHANGE
    bit should be able to be suppressed, unless the user really knows
    what she is doing."

I guess we need (better) documentation about all this :-(

> Also, if we know a size change is a tautology, you could move the
> get_has_size_change() check to be first and save a few CPU cycles.
>
> Other than that, LGTM!
>
> I went ahead and made that change and added (at least a start) on the
> documentation/tests. I don't have access to make a branch, so I just sent a
> patch separately. We can continue discussion there as needed.

Thanks a lot for moving forward on this!

I'll wait for your feedback on these comments and we can proceed with
merging your patch accordingly.  In the mean time, if I have comments,
I'll follow-up on the patch thread, indeed.
Comment 4 John Moon 2023-11-06 17:48:18 UTC
On 11/5/2023 1:43 AM, Dodji Seketeli wrote:
> "quic_johmoo at quicinc dot com" <sourceware-bugzilla@sourceware.org> a
> écrit:
> 
>> https://sourceware.org/bugzilla/show_bug.cgi?id=31017
>>
>> --- Comment #2 from John Moon <quic_johmoo at quicinc dot com> ---
>>
>> Thanks for the implementation! I think this looks great. I tested it and it
>> seems to be working properly for our use case in the kernel!
>>
>> One question about this block:
>>
>> +      // Support for the
>> +      // "has_strict_flexible_array_data_member_conversion = true"
>> +      // clause.
>> +      if (has_strict_fam_conversion())
>> +       {
>> +         // Let's detect if the first class of the diff has a fake
>> +         // flexible array data member that got turned into a real
>> +         // flexible array data member.
>> +         if (!(
>> +               (has_fake_flexible_array_data_member(first_class)
>> +                && has_flexible_array_data_member(second_class))
>> +               // A fake flexible array member has been changed into
>> +               // a real flexible array ...
>> +               &&
>> +               ((first_class->get_size_in_bits()
>> +                 == second_class->get_size_in_bits())
>> +                || get_has_size_change())
>> +               // There was no size change or the suppression has a
>> +               // "has_size_change = true" clause.
>> +               ))
>> +           return false;
>> +       }
>>
>> Is it possible for a structure to meet the first condition (fake flex -> flex)
>> *without* a size change?
> 
> As a general rule, suppression specifications (aka supprspecs) don't apply to a type
> which size has changed, unless the user /really/ wants the supprspec to
> apply.  If she really wants it, then she has to explicitly say
> "has_size_change = yes".  This is prevents "too eager" supprspecs to be
> applied without the user noticing the supprspecs is too eager.
> 
> Basically, if a type's size changed, more often than not, we don't want
> to suppress its change report, for obvious reasons.
> 
> That is why, throughout type_suppression::suppresses_diff you see the
> careful attention to the size change condition, when evaluating
> supprspecs.

Right, I wasn't suggesting to apply the suppression even without the 
"has_size_change = true" clause. I just thought we could avoid the check 
as it was always true.

> 
> In this particular case, I think that we can have "fake flex -> flex"
> changes without a size change because there can other /additional/
> changes that counter the size change we would have expected.  That
> change could have been introduced, on purpose, to keep the ABI stable.
> For instance:
> 
>      $ diff -u test-PR31017-2-v0.c test-PR31017-2-v1.c
>      --- test-PR31017-2-v0.c	2023-11-05 10:04:36.433000539 +0100
>      +++ test-PR31017-2-v1.c	2023-11-05 10:07:00.579810832 +0100
>      @@ -2,7 +2,8 @@
>       {
>         int x;
>         int y;
>      -  int end[1];
>      +  int padding;
>      +  int end[];
>       };
> 
>      $ /home/dodji/git/libabigail/PR31017/build/tools/abidiff test-PR31017-2-v0.o test-PR31017-2-v1.o
>      Functions changes summary: 0 Removed, 1 Changed, 0 Added function
>      Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> 
>      1 function with some indirect sub-type change:
> 
>        [C] 'function void fun(foo*)' at test-PR31017-2-v0.c:9:1 has some indirect sub-type changes:
>          parameter 1 of type 'foo*' has sub-type changes:
>            in pointed to type 'struct foo' at test-PR31017-2-v1.c:1:1:
>              type size hasn't changed
>              1 data member insertion:
>                'int padding', at offset 64 (in bits) at test-PR31017-2-v1.c:5:1
>              1 data member change:
>                type of 'int end[1]' changed:
>                  type name changed from 'int[1]' to 'int[]'
>                  array type size changed from 32 to 'unknown'
>                  array type subrange 1 changed length from 1 to 'unknown'
>                and offset changed from 64 to 96 (in bits) (by +32 bits)
> 
>      $
> 

And this proves I was wrong! :)

I thought libabigail would consider the whole structure size as 
"unknown" if there was a flex array at the end, but this is clearly not 
the case. It just takes the size of the known structs (as the compiler 
does).

> One reason why I think it's important to keep this "rigour" with the
> type size change thing is that abidiff actually returns a code that is a
> bit field that tells callers about the categories of the changes it
> encountered.  From
> https://sourceware.org/libabigail/manual/abidiff.html#return-values, we
> see that if the ABIDIFF_ABI_CHANGE bit is set, it means there were some
> ABI changes.  But then if the ABIDIFF_ABI_INCOMPATIBLE_CHANGE bit is
> set, it means abidiff is 100% sure that at least of the changes causes
> an ABI incompatibility.  In a continuous integration context, for
> instance, if the ABIDIFF_ABI_INCOMPATIBLE_CHANGE bit is set, it means we
> are sure the change is incompatible, whereas if only the
> ABIDIFF_ABI_CHANGE bit is set, it means the change might or might not
> incompatible and thus needs a human review to decide.
> 
> To wraps this all up, I'd say that only changes that would NOT set the
> ABIDIFF_ABI_INCOMPATIBLE_CHANGE bit should be able to be suppressed,
> unless the user really knows what she is doing.
> 
> Thinking about this, maybe check-uapi.sh could use the return code of
> abidiff rather than grepping its output.  check-uapi.sh would then only
> reject changes categorically only if the ABIDIFF_ABI_INCOMPATIBLE_CHANGE
> bit is set.  If only ABIDIFF_ABI_CHANGE bit is set, check-uapi.sh would
> just propose the user to review the changes detected and possibly waive
> them.  One result of the waiving process would thus be a new supprspec
> written and added to the stock of supprspecs to make sure the same kind
> of reviews is not requested in the future.


Agreed, and in v6 of the script, we do this! If you pass the flag "-i" 
to the script, it will ignore abidiff results when return code is 4 
(ABIDIFF_ABI_CHANGE, but not ABIDIFF_ABI_INCOMPATIBLE_CHANGE).

> 
> With time, if a supprspec is recognized to be needed for this particular
> project, libabigail can even integrate it and install it by default for
> that project to use.  We do this for various projects and their default
> supprspecs are included in the default.abignore file that is installed
> libabigail.  You can browse it at
> https://sourceware.org/git/?p=libabigail.git;a=blob;f=default.abignore
> and learn about what project requested default supprspecs.
> 
>> I'd think not, but may be missing something.
> 
> I hope my explanation above helps shed some light in this apparently
> weird way of doing things.

It certainly did, thank you!

> 
>> Basically, I think you can get rid of the first_class->get_size_in_bits() ==
>> second_class->get_size_in_bits() check.
> 
> I would rather keep it, at least for the sake of consistency in the
> behaviour supprspecs evaluation in general, especially with the
> unwritten rule:
> 
>      "only changes that would NOT set the ABIDIFF_ABI_INCOMPATIBLE_CHANGE
>      bit should be able to be suppressed, unless the user really knows
>      what she is doing."
> 
> I guess we need (better) documentation about all this :-(

I think the documentation is clear, I just made an incorrect assumption.

> 
>> Also, if we know a size change is a tautology, you could move the
>> get_has_size_change() check to be first and save a few CPU cycles.
>>
>> Other than that, LGTM!
>>
>> I went ahead and made that change and added (at least a start) on the
>> documentation/tests. I don't have access to make a branch, so I just sent a
>> patch separately. We can continue discussion there as needed.
> 
> Thanks a lot for moving forward on this!
> 
> I'll wait for your feedback on these comments and we can proceed with
> merging your patch accordingly.  In the mean time, if I have comments,
> I'll follow-up on the patch thread, indeed.
> 

Sounds good, thank you!

- John
Comment 5 Dodji Seketeli 2023-11-13 13:28:38 UTC
I have reviewed and amended the patch at
https://inbox.sourceware.org/libabigail/8734xh3j1o.fsf@seketeli.org/.

It works for me, but I am waiting for your feedback to apply it to the
master branch of the Git repository.

Many thanks!
Comment 6 John Moon 2023-11-15 04:58:33 UTC
Thanks for pushing it across the finish line! I just tested it and seems to be working correctly for me. 👍
Comment 7 Dodji Seketeli 2023-11-15 11:48:54 UTC
The patch was applied to the master branch of the git repository at https://sourceware.org/git/?p=libabigail.git;a=commit;h=89ab39de785db5a06e050a3e62bd88e980ab3346 and should be available in libabigail 2.5.

Thank you for filling this enhancement request!