Summary: | invalid character in attribute value | ||
---|---|---|---|
Product: | libabigail | Reporter: | Ben Woodard <woodard> |
Component: | default | Assignee: | Dodji Seketeli <dodji> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | libabigail |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | reproducing elf file |
Description
Ben Woodard
2016-01-06 21:29:11 UTC
Created attachment 8886 [details]
reproducing elf file
This was with 1.0 RC1 from the git tree. So this is due to some function parameter names which contain ASCII *control* characters. I am not sure why this would happen. Maybe this is because the source code file was encoded in something that is not proper ASCII? Unfortunately, I am not aware of any way to detect the encoding of the source file, from the DWARF information; so I am assuming it should be ASCII. The fix involves detecting characters that are not simple ASCII identifier characters in parameter names. If there is any, the parameter name is dropped on the floor. The fix has landed into the master branch at https://sourceware.org/git/?p=libabigail.git;a=commit;h=c3869ecc7bbd6f8370ca29446afdcc1d2631e33d. Is dropping the name on the floor the best thing to do? Wouldn't it be better to encode the non-ascii parameter name into 7b clean ascii sort of like uuencode does. > Is dropping the name on the floor the best thing to do? Wouldn't it be
> better to encode the non-ascii parameter name into 7b clean ascii sort
> of like uuencode does.
For now, we don't use the parameter name anyway. In change reports,
function parameters are referred to using their position.
Furthermore, I think that since we don't know the actual encoding of the
characters, if we are sure they are not ASCII (which is the case here),
I don't think trying to encode each of the byte value can provide us
with any usable information. It's just like if we had garbage. We
won't be able to show any useable information to the user anyway. Hence
my inclination to drop the name altogether.
But if one day we know the actual encoding of the parameter names, then
we can decode them. At that point we'll change the code again and avoid
dropping the name if it's not ascii. If it's, say, UTF-8, then we'll be
able to decode the byte stream, knowing that it's an UTF-8 stream.
|