This is the mail archive of the
dwarf2@corp.sgi.com
mailing list for the dwarf2 project.
Line Number Table Issue
- To: "DWARF 2 Revision mailgroup" <dwarf2 at corp dot sgi dot com>
- Subject: Line Number Table Issue
- From: "Brian Nettleton" <brian dot nettleton at windriver dot com>
- Date: Wed, 17 May 2000 10:58:06 -0700
- Reply-To: "Brian Nettleton" <brian dot nettleton at windriver dot com>
Hello,
I'm new to the group so bear with me if I break protocol. My name is Brian
Nettleton and I work for Wind River Systems, Hardware Software Integration
Division (used to be EST Corp). I'm responsible for the symbol reading
portion of the visionCLICK debugger.
Line Number Table Is_Stmt Issue:
-------------------------------
Recently I came across a situation where a compiler vendor was generating
something I thought unusual for DWARF 2 Line Number Information. This
particular compiler had cleared the is_stmt boolean to false for every entry
in the line number table. The visionCLICK symbol reader was ignoring any
entries that were not true and so threw away all the entries from this
compiler. After reading the DWARF 2.0.0 spec for the is_stmt boolean it
seemed to me that the meaning of this boolean is not real clear and I'm
hoping this group can help shed some light in the upcoming revision.
Here's what the DWARF 2.0.0 spec says about the Is_Stmt boolean.
6.2 Line Number Information
...
If space were not a consideration, the
information provided in the .debug_line
section could be represented as a large
matrix, with one row for each
instruction in the emitted object code.
The matrix would have columns for:
...
- whether this instruction is the
beginning of a source statement
...
6.2.2 State Machine Registers
...
is_stmt A boolean indicating that the
current instruction is the
beginning of a statement.
...
At the beginning of each sequence within
a statement program, the state of the
registers is:
...
is_stmt determined by default_is_stmt
in the statement program
prologue
...
6.2.4 The Statement Program Prologue
...
5. default_is_stmt (ubyte)
The initial value of the is_stmt
register.
A simple code generator that emits
machine instructions in the order
implied by the source program would
set this to "true," and every entry
in the matrix would represent a
statement boundary. A pipeline
scheduling code generator would set
this to "false" and emit a specific
statement program opcode for each
instruction that represented a
a statement boundary.
6.2.5.2 Standard Opcodes
...
6. DW_LNS_negate_stmt
Takes no arguments. Set the is_stmt
register of the state machine to the
logical negation of its current
value.
This seems straight forward enough except the part in section 6.2.4 about a
pipeline scheduling code generator. This is where the problem gets
interesting (without this case the boolean would be unnecessary anyway).
There does seem to be a potential argument that a pipeline optimizing
compiler writer could make that no instruction is a statement boundary!
While I understand the meaning of the theoretical boolean in section 6.2, it
seems less clear in the context of the state machine and the actual is_stmt
boolean. What would one expect a debugger to do with entries where the
is_stmt boolean is false? This note has more discussion, ad nauseam, of the
issue after a proposal for change.
Proposal for DWARF 2.1 change:
------------------------------
This change modifies the is_stmt register of the state machine to have an
initialized value of "true", and clarifies the responsibility of a pipeline
scheduling code generator to identify some instruction as the "beginning" of
a source line.
Textual changes to the specification:
6.2 Line Number Information
...
Such a matrix, however, would be impractically large. We shrink it with two
techniques. First, we delete from the matrix each row whose file, line and
source column information is identical with that of its predecessors. [new
text] Any deleted rows would never be the beginning of a source statement.
[end new text]
...
6.2.2 State Machine Registers
...
is_stmt A boolean indicating that the current
instruction is the beginning of a
statement.
[new text] Every distinct line number
within should always have one and
only one instruction for which this
boolean is true. Except in the case
of inlining or template expansion
where a line number is semantically
repeated in a source file, then each
expansion of a line number should
always have one and only one
instruction for which this boolean
is true.
A simple code generator that emits
machine instructions in the order
implied by the source program would
never modify this register and every
entry in the matrix would represent
a statement boundary. A pipeline
scheduling code generator might mark
some instructions as false when
instructions from several source
statements are intermixed.[end new text]
...
At the beginning of each sequence within a statement program, the state of
the registers is:
...
is_stmt [modified text] "true" [end modified text]
basic_block ...
6.2.4 The Statement Program Prologue
...
5. [modified text] unused (ubyte)
This byte is currently unused. [end modified text]
6. line_base (sbyte)
...
Further Discussion:
-------------------
The current spec always for, and in fact says a pipeline scheduling code
generator should default the is_stmt boolean to "false". This is wrong in
that the first instruction of any sequence would seem by definition to be
the beginning of a source line! It is allowed for a compiler to generated
instructions which aren't associated with any line number in which case the
line number is identified as 0. A debugger would largely ignore these
instructions anyway (especially the is_stmt boolean for these). So even if
an optimizing compiler generated instructions which aren't associated with a
line number then eventually the first instruction generated for an actual
source line would still seem to be the first instruction for that source
line.
So what might a debugger do with entries in the table where is_stmt is
false. Debuggers use the line number tables for basically four things:
1 - To set a breakpoint at the beginning of a source line.
2 - When stepping at the source level to identify when a new source line has
been encountered.
3 - When displaying interspersed disassembled machine code with source code
the line number tables are used to identify where to insert source code into
the disassembly listing.
4 - When a hardware exception occurs, or when displaying a stack trace back
the tables are used to identify the particular source line associated with
an instruction address.
Number 4 is probably the main situation where instructions with both "true"
and "false" is_stmt's are useful. Certainly for number 1 only the "true"
is_stmt instructions are interesting. It isn't clear whether the "false"
is_stmt instructions would or should be used for items 2 and 3 (while using
them might be more technically accurate it also would significantly add to
the "noise" when debugging, stepping back and forth over several lines is
distracting).
One might ask "Do we need an is_stmt boolean anyway? Can't a debugger
simply identify the first instruction associated with a line number and use
this for setting breakpoints and then deal with the other situations as
needed?" The answer is that yes we do need the is_stmt boolean to handle
situations where a source line is expanded multiple times in a file. For
example an inline subroutine which was called twice would have it's source
lines "begin" twice in the instruction sequence. It's not clear that this
is why the DWARF 2 spec originally included this boolean, but this probably
does justify it's existence.
-Brian Nettleton