Fix the .align bug with unwind info

H. J. Lu hjl@lucon.org
Tue Jan 6 19:36:00 GMT 2004


On Tue, Dec 30, 2003 at 07:14:56PM -0800, Jim Wilson wrote:
> In case it isn't clear to others, we need to defer emitting unwind info
> until after relaxation.  Otherwise, we can not correctly compute it in
> some cases, e.g. in the presence of .align directives in the code stream
> for aligning branches.
> 
> HJ is proposing putting the unwind info into a frag, which seems to work
> nicely, except for the problem of estimating the size of the unwind
> info.  We can't compute the size until after relaxation, and the worst
> case size before relaxation is the size of the address space, which is
> not a useful estimate for a frag size.  So we need to make a good
> practical estimate on the maximum size, or we need to look for another
> solution.
> 
> Using a variant frag is how the dwarf2 line number info solves the exact
> same problem, which is why we are looking at it here.
> 
> On Mon, 2003-12-22 at 12:16, H. J. Lu wrote:
> > If we have to estimate the size anyway, we don't need to add a bunch
> > of new variant frags. We just initialize the variant frag with a
> > reasonable size. The only thing which needs a limit is imask. What is
> > its reasonable limit for prologue rlen?
> 
> I was thinking we could compute a theoretical limit because there is a
> limit on how many registers we can save.  Worst case, we save 100 grs,
> 20 frs, 5 branch regs, and some misc regs like pfs, rp, lc, unat, and
> the predicate registers.  Call that 130 registers, double it to account
> for worst case inefficient address arithmetic and/or padding nops, and
> we have 260 instructions.
> 
> However, it seems that instruction scheduling makes this more
> complicated.  Running readelf -u on all files in /usr/bin on a debian
> system, I see that largest prologue is 288 instructions.  This one does
> not save very many registers, and does not use an imask, but it appears
> that instruction scheduling moved the rp register save into the
> following block, making the prologue appear much larger than it is.  The
> largest one that uses an imask is 160 instructions, and again, there was
> movement of instructions into the prologue by the instruction
> scheduler.  We might need to limit scheduling of the prologue to make
> this work, which would be unfortunate.  Gcc doesn't provide any good way
> to limit prologue scheduling without effectively disallowing any
> scheduling at all.  Maybe we can be a bit more intelligent about the
> unwind info that gcc emits?  I haven't looked into this.
> 
> I see quite a few examples that save all registers other than the 96
> local grs, which means 34 registers.  This is probably due to the setjmp
> register saving problem that I recently fixed.  This can be done in as
> few as 65 instructions, it is almost always done in less than 80
> instructions, so a factor of 2 seems a reasonable margin.  The minor
> differences here are presumably the result of instruction scheduling
> moving a few instructions into the prologue.
> 
> Maybe we should ask the question here of whether we ever need to
> estimate the size of a prologue region.  Instruction scheduling will
> never move a branch into a prologue region, so it is probably the case
> that we will never have to defer an imask size calculation until after
> relaxation.  We could give an error for this if we detect such a case to
> be safe.  How about passing another argument to slot_index, which is the
> unwind record type, and if the type is prologue or prologue_gr and there
> is any kind of variable space allocation we give an error.
> 
> Then we only have to worry about estimating the size of body region
> lengths, and this is leb128 (address space/16 * 3) which is 9 bytes
> worst case I believe.  There are only a small number of unwind records
> that are variable size in this scenario, so it shouldn't be a problem to
> assume worst case sizes for them.  This would allow us to handle
> rs_space and rs_org properly, though I doubt that this is very
> important.
> 
> I think this can fail if we have a second prologue section that occurs
> after the first body.  There might be legitimate reasons for this, for
> instance describing optimized tail calls.  We wouldn't need any records
> with an imask in this case though, we would only need the prologue
> record.  We would need to handle this.  Maybe we need something a little
> more complicated where slot_index sets a status indicator if it sees a
> variable space allocation, and then we give an error only if we try to
> emit a record that needs an imask field after a variable space
> allocation has been seen.

My current approach works fine. The only things which don't work are
variable space allocation and variable location counter advance. The
assembler will issue an error when it sees them. Since both approaches
have limitations, which one should we prefer?


H.J.



More information about the Binutils mailing list