Bug 2768 - readelf and segmented addresses in DWARF2/3 aranges
Summary: readelf and segmented addresses in DWARF2/3 aranges
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.16
: P2 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-06-14 12:46 UTC by Stephane Chauveau
Modified: 2006-08-08 09:42 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
reference elf file (not segmented) (2.39 KB, application/octet-stream)
2006-06-15 07:46 UTC, Stephane Chauveau
Details
elf file with address_size=3 bytes and segmen_sizet=1byte (2.40 KB, application/octet-stream)
2006-06-15 07:53 UTC, Stephane Chauveau
Details
elf file with address_size=4 and segment_size=4 (2.40 KB, application/octet-stream)
2006-06-15 08:04 UTC, Stephane Chauveau
Details
Proper version of a_4_4.elf (2.40 KB, application/octet-stream)
2006-06-15 08:33 UTC, Stephane Chauveau
Details
Compute address size as sum of pointer size and segment size (791 bytes, patch)
2006-06-22 14:38 UTC, Nick Clifton
Details | Diff
Display 8 byte addresses as 8 byte values (402 bytes, patch)
2006-08-08 09:41 UTC, Nick Clifton
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stephane Chauveau 2006-06-14 12:46:22 UTC
I am using readelf to dump the DWARF2/3 information for an architecture with
segmented memories (not an arch supported by gcc, sorry!) and I believe that
readelf does not exactly follow the specifications when dumping the
.debug_aranges sections in the function display_debug_aranges(). 

The DWARF2/3 specs, defines the format of an address (segmented or not) by two
bytes in the header of a arange block.

The first byte is defined as "the size in bytes of an address (or the offset
portion of an address for segmented addressing)". 

The second byte is defined as "the size in bytes of a segment descriptor on the
target system." 

My interpretation is that the total size of an address is obtained by the sum of
both bytes. For example, a 32bit address composed of a 8bit segment and a 24bit
offset should be described by the pair (3,1).

The current implementation is assuming that the size of an address is fully
specified by the 1st byte so is likely to crash (or to produce inconsistant
results) when the second byte is non-zero.
Comment 1 Nick Clifton 2006-06-14 16:36:52 UTC
Hi Stephane,

  Can you provide an example object file containing debug information for a
segmented address architecture ?

Cheers
  Nick

Comment 2 Stephane Chauveau 2006-06-15 07:41:00 UTC
I am currently trying to implement DWARF2 support for our architectures so I can
only provide samples that show my current interpretation of the specs.  
More precisely, our architectures are not exactly segmented - at least not like
the x86 was. They are DSPs in which the PROGRAM memory is distinct from the DATA
memory. Some models also support two distinct DATA memories (typically named X
and Y) and a shared version (XY). To make things even more complex, parts of the
PROGRAM and DATA memories can be paged (i.e multiple versions selectable via a
control registers).
The addresses themselves are 16bit or 32bit. In DWARF1, we used to encode th
memory and page information in the unused bits of the 32bit addresses. In
DWARF2, we want to encode those information in the segment fields (e.g
AT_segment and AT_address_class).

I will provide samples for an architecture with the following characteristics:
  - an address is 32bit in which only 24bit are currently used. 
  - two memories PROGRAM & DATA encoded as segment numbers 0 and 1.
 

 




 
Comment 3 Stephane Chauveau 2006-06-15 07:46:51 UTC
Created attachment 1091 [details]
reference elf file (not segmented)

in that version, the arange section is not explictly segmented. The 'segment'
is implicitly encoded in the most significant 8 bit of each 32bit address. So,
for example, 010b0000 describes a range at address 0x0b0000 in segment 0x01
(DATA). 

# readelf --debug-dump=aranges a_4_0.elf 
The section .debug_aranges contains:

  Length:		    52
  Version:		    2
  Offset into .debug_info:  0
  Pointer Size: 	    4
  Segment Size: 	    0

    Address  Length
    010b0000 2
    01050000 9
    00000200 20
    00000214 536
Comment 4 Stephane Chauveau 2006-06-15 07:53:40 UTC
Created attachment 1092 [details]
elf file with address_size=3 bytes and segmen_sizet=1byte

In that version, I indicate explicitly that an address is compose of a 3 bytes
offset and a 1 byte segment so still a total of 4 bytes.
Readelf does not like it at all:
 
# readelf a_3_1.elf
The section .debug_aranges contains:

  Length:		    52
  Version:		    2
  Offset into .debug_info:  0
  Pointer Size: 	    3
  Segment Size: 	    1

    Address  Length
readelf: Error: Unhandled data length: 3
Aborted
Comment 5 Stephane Chauveau 2006-06-15 08:04:17 UTC
Created attachment 1093 [details]
elf file with address_size=4 and segment_size=4

In the two previous samples, I rely on the fact that only 24 of the 32 bits are
addressable. This is a limitation of the current implementation and not an
intrinsic property of the architecture itself. 
In this sample ELF file, the segmented address is encoded in 8 bytes: 4 for the
 address itself and 4 for the segment.

Readelf does not fail on that one but misinterpret the number of ranges:

# readelf --debug-dump=aranges a_4_4.elf 
The section .debug_aranges contains:

  Length:		    84
  Version:		    2
  Offset into .debug_info:  0
  Pointer Size: 	    4
  Segment Size: 	    4

    Address  Length
    000b0000 1
    00000002 1
    00050000 1
    00000009 1
    00000200 0
    00000014 0
    00000214 0
    00000218 0
Comment 6 Stephane Chauveau 2006-06-15 08:24:25 UTC
Comment on attachment 1093 [details]
elf file with address_size=4 and segment_size=4

The ELF file is incorrect (the end marker in a .debug_arange section is
2*4bytes instead of 2*8bytes)
Comment 7 Stephane Chauveau 2006-06-15 08:33:47 UTC
Created attachment 1094 [details]
Proper version of a_4_4.elf

A proper version of the previous sample a_4_4.elf.
Readelf does not crash but interpret each offset&segment as a pair
address&length. 

# readelf --debug-dump=aranges a_4_4.elf 
The section .debug_aranges contains:

  Length:		    92
  Version:		    2
  Offset into .debug_info:  0
  Pointer Size: 	    4
  Segment Size: 	    4

    Address  Length
    000b0000 1
    00000002 1
    00050000 1
    00000009 1
    00000200 0
    00000014 0
    00000214 0
    00000218 0
Comment 8 Stephane Chauveau 2006-06-15 09:21:11 UTC
A few more remarks:

(1) Because of the alignment constraints, it is probably safe to assume that the
total size of each address (including the segment) will be a power of two.  
Readelf should be safe if it covers the total address sizes of 2, 4 & 8. 
The case address_size=2 and segment_size=2 is a likely to happen since DSPs with
16bit addresses and complex memory features (paging, PROG/DATA, ...) are quite
common. I do not have any samples yet. 

(2) The DWARF specs do not specify how the segments is encoded in the address so
readelf should simply dump the overall value.

(3) The cases such as address_size=4 & segment_size=4 are problematic when
readelf is built without 64bit support. 
The functions get_byte_little_endian() and get_byte_big_endian() are only
providing 4 of the required 8 bytes. 
Displaying only part of the values is a minor issue.
What is more problematic is that the detection of the end marker (2 times ZERO)
could be incorrect.   

A simple way to have a very generic implementation could be to process the
address & length not using get_byte() but using two new functions:
   - one to dump an arbitrary sequence of bytes as a little or big endian
hexadecimal number. 
   - one to test if a sequence of bytes is only composed of zeros.
    
Comment 9 Nick Clifton 2006-06-22 14:36:17 UTC
Hi Stephane,

  Thanks for supplying the test cases.  I have now been able to reproduce the
problem and I am going to upload a patch that fixes it.  If you would care to
try it out and let me know if you encounter any problems I would be most grateful.

Cheers
  Nick
Comment 10 Nick Clifton 2006-06-22 14:38:00 UTC
Created attachment 1111 [details]
Compute address size as sum of pointer size and segment size
Comment 11 Stephane Chauveau 2006-06-28 10:01:27 UTC
I checked the pathc and it seems to work fine. 
The only remaining problem is the one I described in comment 8
The overall size when both the ponter and segment size are 4 bytes is 64 bit but
only 32bit are displayed. More generally, DWARF2 does not fully specify how to
interpret a segmented address so it would make sense to display each segemented
address (and the length too) as a byte sequence instead of converting it to a
numerical value. 

The current output for a_4_4.elf looks relatively good because the file is
little endian:  

    Address  Length
    000b0000 2
    00050000 9
    00000200 20
    00000214 536
    00000000 0

If the file was big endian (or if the relative order of the segment and offset
part of each address were swapped), the other 32 bit would be used and the
output would be:

    Address  Length
    00000001 0
    00000001 0
    00000000 0
    00000000 0
    00000000 0

The output obtained with printing the raw sequences of bytes would be: 
  
    Address            Length
    00000001000b0000 0x0000000000000002
    0000000100050000 0x0000000000000009
    0000000000000200 0x0000000000000014
    0000000000000214 0x0000000000000218
    0000000000000000 0x0000000000000000

I prefixed the length by '0x' to avoid a potential ambiguity because the it is
currently expressed in decimal for non segmented addresses.    
Comment 12 Nick Clifton 2006-08-08 09:41:13 UTC
Created attachment 1211 [details]
Display 8 byte addresses as 8 byte values
Comment 13 Nick Clifton 2006-08-08 09:42:31 UTC
Hi Stephane,

  My apologese for taking so long to get back to this issue.

  You are right - the addresses and lengths are being incorrectly truncated.  I
am going to apply the newly uploaded patch to fix this.

Cheers
  Nick

binutils/ChangeLog
2006-08-08  Nick Clifton  <nickc@redhat.com>

	PR binutils/2768
	* dwarf.c (display_debug_aranges): When the address size is
	greater than 4 display addresses and lengths as 16 hex digits,
	otherwise use 8 hex digits.