Bug 19434 - invalid character in attribute value
Summary: invalid character in attribute value
Status: RESOLVED FIXED
Alias: None
Product: libabigail
Classification: Unclassified
Component: default (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Dodji Seketeli
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-06 21:29 UTC by Ben Woodard
Modified: 2016-01-19 09:31 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
reproducing elf file (3.68 MB, application/x-gzip)
2016-01-06 21:30 UTC, Ben Woodard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Woodard 2016-01-06 21:29:11 UTC
bash-4.1$ ~/bin/abidw --abidiff /collab/usr/global/tools/totalview/r/toolworks/totalview.8.12.0-1/linux-x86-64/bin/tvdsvrmain_mic 
/tmp/libabigail-tmp-file-HC4EVK:21019: parser error : invalid character in attribute value
      <parameter type-id='type-id-481' name='$5'/>
                                              ^
/tmp/libabigail-tmp-file-HC4EVK:21019: parser error : attributes construct error
      <parameter type-id='type-id-481' name='$5'/>
                                              ^
/tmp/libabigail-tmp-file-HC4EVK:21019: parser error : Couldn't find end of Start Tag parameter
      <parameter type-id='type-id-481' name='$5'/>
                                              ^
Could not read temporary XML representation of elf file back

This looks like it is a new one.
Comment 1 Ben Woodard 2016-01-06 21:30:56 UTC
Created attachment 8886 [details]
reproducing elf file
Comment 2 Ben Woodard 2016-01-06 21:31:40 UTC
This was with 1.0 RC1 from the git tree.
Comment 3 Dodji Seketeli 2016-01-18 17:27:02 UTC
So this is due to some function parameter names which contain ASCII *control* characters.  I am not sure why this would happen.  Maybe this is because the source code file was encoded in something that is not proper ASCII?  Unfortunately, I am not aware of any way to detect the encoding of the source file, from the DWARF information; so I am assuming it should be ASCII.

The fix involves detecting characters that are not simple ASCII identifier characters in parameter names.  If there is any, the parameter name is dropped on the floor.

The fix has landed into the master branch at https://sourceware.org/git/?p=libabigail.git;a=commit;h=c3869ecc7bbd6f8370ca29446afdcc1d2631e33d.
Comment 4 Ben Woodard 2016-01-18 19:08:51 UTC
Is dropping the name on the floor the best thing to do? Wouldn't it be better to encode the non-ascii parameter name into 7b clean ascii sort of like uuencode does.
Comment 5 Dodji Seketeli 2016-01-19 09:31:37 UTC
> Is dropping the name on the floor the best thing to do? Wouldn't it be
> better to encode the non-ascii parameter name into 7b clean ascii sort
> of like uuencode does.

For now, we don't use the parameter name anyway.  In change reports,
function parameters are referred to using their position.

Furthermore, I think that since we don't know the actual encoding of the
characters, if we are sure they are not ASCII (which is the case here),
I don't think trying to encode each of the byte value can provide us
with any usable information.  It's just like if we had garbage.  We
won't be able to show any useable information to the user anyway.  Hence
my inclination to drop the name altogether.

But if one day we know the actual encoding of the parameter names, then
we can decode them.  At that point we'll change the code again and avoid
dropping the name if it's not ascii.  If it's, say, UTF-8, then we'll be
able to decode the byte stream, knowing that it's an UTF-8 stream.