This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
[PATCH] MIPS/GAS: Correct microMIPS branch swapping assertion
- From: "Maciej W. Rozycki" <macro at codesourcery dot com>
- To: <binutils at sourceware dot org>
- Cc: Richard Sandiford <rdsandiford at googlemail dot com>
- Date: Thu, 2 Aug 2012 20:32:03 +0100
- Subject: [PATCH] MIPS/GAS: Correct microMIPS branch swapping assertion
Hi,
A bug in GAS causes the assembly of some files to fail where the branch
swapping optimisation is enabled (which is the default) and branch delay
slots have not been explicitly scheduled, either by the compiler or
manually in handcoded assembly. If that happens, then a message like:
foo.s:1234: Error: internal error: fixup not contained within frag
is produced.
The cause is the internal structure of the assembler, generated machine
code is stored in dynamically allocated memory chunks called frags. These
chunks have a fixed size, when enough machine code has been generated to
fill a frag completely, then another one is allocated.
Frequently these chunks are not completely filled, because some assembly
code sequences result in two variants of machine code produced, to handle
different scenarios that the assembler cannot resolve until all of the
input file has been consumed. For example such variants are produced in
some code models where a different sequence is needed to access data
depending on whether it is associated with a local or an external symbol
and the symbol's definition has not been seen yet. Such variant code has
to be put last in a frag and therefore a new frag always follows
immediately even though the previous frag has only been partially filled.
The particular scenario that triggers this bug is where a branch
instruction is first in a frag and GAS decides it can fill its delay slot
by the preceding instruction (effectively swapping the order of the two
instructions), that comes from the previous frag.
This is performed differently depending on whether the branch is relaxed
-- that is it can be transformed to a different sequence of code later on
-- or not. In the former case the last instruction is deleted from the
previous frag, shrinking the frag by the length of that instruction, and
moved into the current one just after the branch. In the latter case the
branch is simply exchanged with the last instruction from the previous
frag and nothing else is changed.
It is the latter case that triggers the bug. It happens in the microMIPS
mode only, where the branch is a 32-bit instruction and the other
instruction is a 16-bit instruction. As a result after the swap the
branch ends up sticking out half-way beyond the earlier frag. This causes
the assertion noted at the beginning to trigger as code later on notices a
relocation (fixup) applied to the branch reaching beyond the frag. I
haven't tried a scenario where a 16-bit branch is swapped with a 32-bit
instruction, but I'd expect other interesting observations.
As variant code cannot be placed in a branch delay slot this bug only
triggers where the previous frag has been completely filled which makes it
happen every once in a while only. Also this bug does not happen for
MIPS16 code (where a 32-bit branch can be swapped with a 16-bit other
instruction, but not the other way round) that is handled by a different
execution path.
Here's a one-line fix that enables code already used for relaxed branches
for use with fixed branches as well. I have verified it to work correctly
with the piece of code that originally triggered it, but I wanted to make
sure this is covered by our test suite as well.
This turned out to be a bit of a challenge as the test suite cannot
predict exactly where the end of the first frag is going to be. There are
two reasons for that.
First, frags are placed in memory allocated dynamically with
obstack_alloc. This is a GNU extension and is implemented in (lib)iberty
and glibc; the former is bundled with binutils and is used where not
available in the system C library, otherwise the system-supplied
(presumably glibc) version is used.
Obstack memory is allocated in chunks and a frag can be freely grown up
to the amount of memory available in the obstack it has been put in.
Once the frag cannot be grown any further it has to be closed and a new
frag created, that will be placed in a newly allocated obstack.
The size of the chunk and therefore any obstack allocated, by default,
depends on the programming model used on the host. GAS starts by using
the default size. This size is calculated as 4096 bytes minus a
programming-model specific amount calculated based on sizeof (union
fooround) defined internally, that I believe can be for *nix systems, that
we may be potentially concerned about, one of 4, 8, 12 or 16, and the
corresponding amount comes out as a value between 16 and 32, so the
remaining space is between 4064 and 4080.
Second, by the time the first instruction is assembled, some space in the
obstack has already been consumed. In practice for the configuration I
used the value came out as 3988. So I have figured out this can only be
tested with the aid of some heuristics that can adapt for system
specifics, and ended up with the test case included below that sweeps a
range of possible offsets up to 4096 in hope the end of the first frag
will be somewhere within.
I have figured out it makes no sense to start from zero as that would be
an unnecessary waste of time and disk space and decided it will be enough
if the last 256 bytes are scanned. Also there is no need to scan every
byte boundary as any instruction that does not entirely fit in the
remaining space in a frag is automatically carried over to a newly-created
frag and the preceding frag is closed.
As I have written above, GAS starts by using the default chunk size, but
interestingly enough, this size is applied to the first obstack only. As
soon as another obstack is needed GAS switches to using about the minimum
size possible and starts requesting further obstacks in amounts that are
double the instruction size only, that for the MIPS target means either 4
or 8 bytes. How miserable! Obstack allocation code uses some arbitrary
look-ahead heuristics and actually provides some more, but that is is
still 84 bytes only, so beyond the first obstack GAS starts thrashing with
requests for new obstacks. This is I believe an unnecessary memory and
performance waste.
By how the offending code looks I gather the intent was to provide for
large obstacks where an excessively big chunk of data needs to be placed
in a single frag (the associated comment quotes 2GB!), but inadvertently
the code does not check for the allocation size getting smaller than the
reasonable default. There is normally no penalty from using the default
even for the smallest frags possible, because as many individual frags can
be placed in a single obstack as will fit there. Therefore I believe this
is a missed optimisation and I will be posting a proposed fix separately.
So, to recap, here is the final change and the test results for the test
case are as follows:
Progressions and removed failures:
FAIL -> PASS: gas.sum:MIPS branch swapping (1018)
FAIL -> PASS: gas.sum:MIPS branch swapping (996)
The "1018" failure goes away as expected even without the fix below when
the obstack allocation pessimisation is eliminated.
I have regression-tested it with my usual set of targets including but
not limited to 23 MIPS configurations and spotted no problems. OK to
apply?
2012-08-02 Maciej W. Rozycki <macro@codesourcery.com>
gas/
* config/tc-mips.c (append_insn): Also handle moving delay-slot
instruction across frags for fixed branches.
gas/testsuite/
* gas/mips/branch-swap-2.l: New list test.
* gas/mips/branch-swap-2.s: New test source.
* gas/mips/mips.exp: Run the new test.
Maciej
binutils-gas-mips-branch-swap.diff
Index: binutils-fsf-trunk-quilt/gas/config/tc-mips.c
===================================================================
--- binutils-fsf-trunk-quilt.orig/gas/config/tc-mips.c 2012-08-02 15:17:48.000000000 +0100
+++ binutils-fsf-trunk-quilt/gas/config/tc-mips.c 2012-08-02 18:55:31.610534816 +0100
@@ -4488,7 +4488,7 @@ append_insn (struct mips_cl_insn *ip, ex
move_insn (ip, delay.frag, delay.where);
move_insn (&delay, ip->frag, ip->where + insn_length (ip));
}
- else if (relaxed_branch)
+ else if (relaxed_branch || delay.frag != ip->frag)
{
/* Add the delay slot instruction to the end of the
current frag and shrink the fixed part of the
Index: binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/branch-swap-2.l
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/branch-swap-2.l 2012-08-02 18:55:31.620517233 +0100
@@ -0,0 +1 @@
+# No warnings or errors expected!
Index: binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/branch-swap-2.s
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/branch-swap-2.s 2012-08-02 18:55:31.630534624 +0100
@@ -0,0 +1,8 @@
+ .set micromips
+ .text
+foo:
+ .rept count
+ ori $2, $3, (. - foo) >> 2
+ .endr
+ addu $2, $3, $4
+ j ext
Index: binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/mips.exp
===================================================================
--- binutils-fsf-trunk-quilt.orig/gas/testsuite/gas/mips/mips.exp 2012-08-02 15:42:25.000000000 +0100
+++ binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/mips.exp 2012-08-02 19:21:46.340718916 +0100
@@ -505,6 +505,20 @@ if { [istarget mips*-*-vxworks*] } {
run_dump_test_arches "branch-misc-2pic-64" [mips_arch_list_matching mips3]
run_dump_test "branch-misc-3"
run_dump_test "branch-swap"
+
+ if $elf {
+ # Sweep a range of branch offsets so that it hits a position where
+ # it is at the beginning of a frag and then swapped with a 16-bit
+ # instruction from the preceding frag. The offset will be somewhere
+ # close below 4096 as this is the default obstack size limit that
+ # we use and some space will have been already consumed. The exact
+ # amount depends on the host's programming model.
+ for { set count 960 } { $count <= 1024 } { incr count } {
+ run_list_test "branch-swap-2" "--defsym count=$count" \
+ "MIPS branch swapping ($count)"
+ }
+ }
+
run_dump_test "div"
if { !$addr32 } {