Bug 18838

Summary: Normalize the output of abidw
Product: libabigail Reporter: Dodji Seketeli <dodji>
Component: defaultAssignee: Dodji Seketeli <dodji>
Status: RESOLVED FIXED    
Severity: enhancement CC: andrew.c.morrow, dodji, libabigail, roland
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on:    
Bug Blocks: 19843    

Description Dodji Seketeli 2015-08-17 09:29:57 UTC
Normalize the native XML representation of ABIs that libabigail emits (what abidw emits, basically) so that two binaries with the same ABI always (mostly) end-up having the same ABI XML.

This means basically reasonably sorting decls in their contexts, normalizing spaces (okay this is already done by the XML emitted), and things like that.

Also, maybe we should start emitting versions of the ABI XML format ...

This is important for upstream projects willing to store ABI XML as ABI baselines, so that they can review baseline changes by looking at diffs of the XML.
Comment 1 andrew.c.morrow 2015-08-17 14:36:09 UTC
This is also important when using the output of abidw to prune unnecessary dynamic relinks in a build system.

It might also be useful for that use case to have an 'abihash' utility, which would traverse the libabigail data structures, but instead of emitting an XML representation, would just emit a summary MD5 or SHA1 of the relevant info.
Comment 2 Dodji Seketeli 2015-08-20 15:34:03 UTC
(In reply to andrew.c.morrow from comment #1)
> This is also important when using the output of abidw to prune unnecessary
> dynamic relinks in a build system.
> 
> It might also be useful for that use case to have an 'abihash' utility,
> which would traverse the libabigail data structures, but instead of emitting
> an XML representation, would just emit a summary MD5 or SHA1 of the relevant
> info.

I have been thinking about this.

I think that if comparing two versions of a library L is fast enough, it might be better to do that, and if the ABI changed in an incompatible way, then trigger the re-linking of applications that linked against the older version of L.

I am saying this because I think that having the ABIs of the two versions of L being different doesn't necessary imply that an application using L should be re-linked against the newer version of L. For instance, if a new entry point got added to L and the application doesn't use that new entry point, then it doesn't need to be re-linked.  Heck, even if an entry got *removed* from L and the application doesn't use that entry point, no re-linking should be necessary.

This is actually what the tool 'abicompat' is for.  If it detects that the application is impacted by an ABI change that happened in L, then the application ought to be re-linked.
Comment 3 Dodji Seketeli 2015-08-20 15:40:20 UTC
(In reply to dodji from comment #2)

[...]

> This is actually what the tool 'abicompat' is for.  If it detects that the
> application is impacted by an ABI change that happened in L, then the
> application ought to be re-linked.

I forgot to mention that abicompat is documented at https://sourceware.org/libabigail/manual/abicompat.html.
Comment 4 andrew.c.morrow 2015-08-20 16:34:42 UTC
I agree with what you are saying in the sense of deciding manually if an application should re-link. Not all ABI changes in a dependent library require a re-link.

However, my use case is more narrow and less procedurally flexible.

I'm interested in the ABI hash as part of a best-effort elimination of needless re-links in a build system that generates lots of shared libraries. In other words, if liba.so depends on libb.so, and libb.so was just rebuilt, I'd like to avoid rebuilding liba.so - if possible. If the ABI hash of libb.so didn't change, then I definitely don't need to re-link liba.so. If it did change then I'll just re-link liba.so. Perhaps that wasn't strictly necessary as you point out, but at worst that is a missed build time optimization.

Needing to have access to two versions of libb.so is also problematic in the context of a build system like SCons, because there typically is no way to retain the prior version of libb.so, or name such a thing in the dependency graph.
Comment 5 Dodji Seketeli 2015-08-20 17:13:32 UTC
(In reply to andrew.c.morrow from comment #4)

> Needing to have access to two versions of libb.so is also problematic in the
> context of a build system like SCons, because there typically is no way to
> retain the prior version of libb.so, or name such a thing in the dependency
> graph.

Ah, I see.

There is a --weak-mode of abicompat, with which you just need to have access to the later version of libb.so.  i.e:

  abicompat --weak-mode liba.so libb.so

But in that case, abicompat won't tell you if a symbol which liba.so consumes from libb.so disappears from libb.so.  It'll just tell you if the types consumed by liba.so from libb.so stop meaning the "same thing".

So close, and yet so far.  So yeah, I agree with you. 

There is another thing to keep in mind.  Right now, even if two versions of libb.so have the *same* ABI, their corresponding abidw output might be different.

This is mainly because the output of abidw is not normalized.  Hence this enhancement request.  Also, even when the output of abidw becomes normalized, a new version of libabigail might make that that output slightly change, for instance, because we add support for new ABI artifacts.

But then, that is just a missed optimization, I agree with you. For the sake of the intellectual correctness, I thought I'd mention it nonetheless.

Thank you for following up on this.
Comment 6 Dodji Seketeli 2015-08-31 09:45:34 UTC
Another thing I think should be added to the xml output (I am calling the xml format abixml now) is annotations to make the output easier to read and understand.

for instance, if in abixml you have this:

   <type-decl name='char' size-in-bits='8' id='type-id-7'/>
    (...)
   <qualified-type-def type-id='type-id-7' const='yes' id='type-id-8'/>

The second line becomes easier to read if it's accompanied by a comment like:

   <type-decl name='char' size-in-bits='8' id='type-id-7'/>
    (...)

   <qualified-type-def type-id='type-id-7' const='yes' id='type-id-8'/>
Comment 7 Dodji Seketeli 2015-08-31 10:08:55 UTC
Woops, I hit the wrong button too soon.  Here is what I wanted to say:

Another thing I think should be added to the xml output (I am calling the xml format abixml now) is annotations to make the output easier to read and understand.

for instance, if in abixml you have this:

   <type-decl name='char' size-in-bits='8' id='type-id-7'/>
    (...)
   <qualified-type-def type-id='type-id-7' const='yes' id='type-id-8'/>
    (...)
   <pointer-type-def type-id='type-id-8' id='type-id-9'/>

The second and third lines becomes easier to read when accompanied by a comment like:

   <type-decl name='char' size-in-bits='8' id='type-id-7'/>
    (...)
   <!-- const char -->
   <qualified-type-def type-id='type-id-7' const='yes' id='type-id-8'/>
    (...)
   <!-- const char* -->
   <pointer-type-def type-id="type-id-8' id='type-id-8'/>

This eases reviews of changes to abixml files.
Comment 8 Roland McGrath 2016-03-18 17:37:50 UTC
This is another thing we'll need for glibc to start using abidw as part of its build process.  We want abixml files we can commit to the source tree as reference, and changes get manually reviewed by humans.  So normalization is useful to make changes "patch-friendly", i.e. such that the patch between the normal abixml of libfoo v1 and the normal abixml of libfoo v2 is easily recognized by a human as "just adding the new stuff".
Comment 9 Dodji Seketeli 2017-01-24 15:10:13 UTC
These days, the abixml format is "quite" normalized.

That means, a given version of libabigail will always emit the same abixml output for the same binary.

This feature is used by libabigail itself in its regression test suite.

There are sabidw options to help with some useful cases.  For instance, one can be willing to avoid emitting some absolute paths in some attributes of the abixml documents, because those might depend on where the binary was built.

There is also the new --annotate option now that emits (human readable) comments (aka annotations) describing the types and decls that are formally defined in the abixml output.  

The idea is to help humans decipher the abixml output, and also, whenever some "details" of the abixml change without changing the overall meaning of a type or decl, we expect the annotation to stay the same.

So I guess this bug can now be closed.

Thank you for filling this!