Bug 24373 - Arm Cortex-A53 Errata 843419 workaround inserts 4K stub even when unnecessary
Summary: Arm Cortex-A53 Errata 843419 workaround inserts 4K stub even when unnecessary
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.30
: P2 normal
Target Milestone: 2.33
Assignee: Tamar Christina
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-21 21:28 UTC by Julius Werner
Modified: 2019-05-22 10:03 UTC (History)
3 users (show)

See Also:
Host:
Target: aarch64-linux-gnu
Build:
Last reconfirmed: 2019-04-08 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Julius Werner 2019-03-21 21:28:09 UTC
The --fix-cortex-a53-843419 flag causes the BFD linker to work around a hardware issue on Arm Cortex-A53 CPUs when an ADRP instruction is placed within the last 8 bytes of a 4K page. It implements two possible workarounds: 1. if possible, it just replaces the ADRP instruction with an ADR instruction (which cannot trigger the incorrect behavior); 2. if the target address of the ADRP is not within +/- 1MB of the PC (so it cannot be replaced by an ADR), the linker will insert a two-instruction veneer function to replace possibly affected code after the ADRP. The veneer is extended to 4K size to make sure it doesn't shift other ADRP instructions around in a way that could create further hazards.

However, for some reason ld will also insert this veneer even if it chooses the first workaround (replace with ADR). In that case, the extra 4K in the output image are just dead code that is not referenced by anything and will never be executed. Is this an intentional limitation of the workaround implementation? Bloating a binary by 4K for no reason can be a significant problem for certain use cases, e.g. in boot firmware that executes from ROM or SRAM before main memory is initialized. This issue is exacerbated by the fact that this workaround is required very rarely and unpredictably based on code shifting around, which means that binaries which used to fit with kilobytes to spare may suddenly no longer fit due to a tiny and completely unrelated change. Since the kind of programs that need to worry about 4K size changes are usually smaller than 1MB, it should always be possible to translate ADRP instructions to ADR for them.

From my (very limited) understanding of the code I gather that the veneer section must be added before ld can determine which of the two workarounds it will apply so that correct relocations can be applied to it... but couldn't you then just delete that section again at the point where you decide to apply the easier workaround (translate into ADR), before you output it into the final binary?

Minimal test case:

$ cat errata.S 
.balign 0x1000
_start: 
        .skip 0xffc
        adrp x0, _start + 0x8000
        str x2, [x2]
        mov x2, #0
        ldr x1, [x0, #0x40]

$(CROSS_COMPILE)-as errata.S -o errata.o
$(CROSS_COMPILE)-ld -o errata errata.o --fix-cortex-a53-843419
$(CROSS_COMPILE)-objdump -xFD errata

(Note how the linker emitted a 4K veneer section called e843419@0002_00000010_1008, even though the instruction at address 400ffc is an ADR and the veneer section is not referenced by any code.)
Comment 1 Tamar Christina 2019-04-01 12:23:43 UTC
Hi Julius,

I'm not sure if we can drop it, the problem is that once the space is reserved and the size of the output section is determined, then the output_offsets are fixed.

if the padding is dropped after sizing then these have to be recalculated. But we currently (if I remember correctly) determine which one to use quite late and so relocations may have been processed already so it would be quite expensive to recompute everything.

I will take a look and see if there's anything we can do about it.
Comment 2 Julius Werner 2019-04-01 19:59:49 UTC
Hi Tamar,

Thanks for looking into this. I really have no idea how the linker internals work here so I just hope you can find a way to make it work.

When you say it may get quite expensive to recompute things, you just mean linker execution time, right? I think many users would be happy to pay even double link times if that gains them runtime memory pressure improvements. For my project at least, link time is not really an issue, and I'd expect the same to be true for plenty of others.

If you're too worried about increasing link time, maybe this is an optimization that could be hidden behind the -O flag?
Comment 3 Tamar Christina 2019-04-08 09:18:01 UTC
Hi Julius,

Hmm well if linking speed isn't an issue then I should be able to come up with a solution.  I will still try to see if it can't be done as part of the normal workflow but if not I will add an new option for an aggressive mode.
Comment 4 Sourceware Commits 2019-05-21 12:12:12 UTC
The master branch has been updated by Tamar Christina <tnfchris@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=739b5c9c778dee9e2f54d864f83a81ecb0639535

commit 739b5c9c778dee9e2f54d864f83a81ecb0639535
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Tue May 21 13:04:08 2019 +0100

    AArch64: Implement choice between Cortex-A53 erratum workarounds. (PR ld/24373)
    
    The Cortex-A53 erratum currently has two ways it can resolve the erratum when
    using the flag --fix-cortex-a53-843419:
    
    1) If the address is within the range of an ADR instruction it rewrites the ADRP
       into an ADR, and those doesn't need the use of a veneer.
    
    2) If the address is not within range, it adds a branch to a veneer which will
       execute the final bit of the erratum workaround and branch back to the call
       site.
    
    When we do this we always generate the veneers and we always align the size of
    the text section to 4KB.  This is because we only know which workaround we can
    use after all linking has finished and all addresses are known.  This means even
    though the veneers are not used, we still generate the section and we still
    change the size of the input section.
    
    This is problematic for small memory devices as this would require the user to
    take about a ~4KB hit in memory even though it's not even used.
    
    Since there's no real way to restart the linking process from the final write
    phase this patch solves the issue by allowing the user more control over which
    erratum workaround gets used.
    
    Concretely this changes the option --fix-cortex-a53-843419 to take optional
    arguments --fix-cortex-a53-843419[=full|adr|adrp]
    
    - full (default): Use both ADRP and ADR workaround. This is equivalent to not
    		  specifying any options and is the default behavior before this
    		  patch.
    
    - adr: Only use the ADR workaround, this will not cause any increase in binary
           size but linking will fail if the referenced address is out of range of
           an ADR instruction.
    
    - adrp: Use only the ADRP workaround, this will never rewrite your ADRP.
    
    In the cases where the user knows how big their binaries are the `adr` option
    would prevent the unneeded overhead.
    
    bfd/ChangeLog:
    
    	PR ld/24373
    	* bfd-in.h (enum erratum_84319_opts): New
    	(bfd_elf64_aarch64_set_options, bfd_elf32_aarch64_set_options): Change
    	int to enum erratum_84319_opts.
    	* bfd-in2.h: Regenerate.
    	* elfnn-aarch64.c (struct elf_aarch64_link_hash_table): Change
    	fix_erratum_843419 to use new enum, remove fix_erratum_843419_adr.
    	(_bfd_aarch64_add_stub_entry_after): Conditionally create erratum stub.
    	(aarch64_size_one_stub): Conditionally size erratum 843419 stubs.
    	(_bfd_aarch64_resize_stubs): Amend comment.
    	(elfNN_aarch64_size_stubs): Don't generate stubs when no workaround
    	requested.
    	(bfd_elfNN_aarch64_set_options): Use new fix_erratum_843419 enum.
    	(_bfd_aarch64_erratum_843419_branch_to_stub): Implement selection of
    	erratum workaround.
    	(clear_erratum_843419_entry): Update erratum conditional.
    
    ld/ChangeLog:
    
    	PR ld/24373
    	* emultempl/aarch64elf.em (PARSE_AND_LIST_LONGOPTS): Add optional args
    	to flags.
    	* NEWS: Add changes to flag.
    	(PARSE_AND_LIST_OPTIONS): Update help descriptions.
    	(PARSE_AND_LIST_ARGS_CASES): Add new options to parser.
    	* testsuite/ld-aarch64/aarch64-elf.exp: Add new run_dump_tests.
    	* testsuite/ld-aarch64/erratum843419-adr.d: New test.
    	* testsuite/ld-aarch64/erratum843419-adrp.d: New test.
    	* testsuite/ld-aarch64/erratum843419-far-adr.d: New test.
    	* testsuite/ld-aarch64/erratum843419-far-full.d: New test.
    	* testsuite/ld-aarch64/erratum843419-far.s: New test.
    	* testsuite/ld-aarch64/erratum843419-full.d: New test.
    	* testsuite/ld-aarch64/erratum843419-near.s: New test.
    	* testsuite/ld-aarch64/erratum843419-no-args.d: New test.
Comment 5 Tamar Christina 2019-05-21 12:15:54 UTC
Hi Julius,

Sorry for the delay, I couldn't find a reasonably feasible way to restart the linking, so instead I implemented a way to allow users to pick which workaround to use.

In your case if you use `--fix-cortex-a53-843419=adr` it'll do what you want.  If your application ends up being too big for this workaround linking will fail.

Is this sufficient for your use case?
Comment 6 Christophe Lyon 2019-05-21 13:21:57 UTC
Hi Tamar,

It seems your patch does not build on 32-bits hosts:
../../bfd/elfnn-aarch64.c: In function ‘_bfd_aarch64_erratum_843419_branch_to_stub’:
../../bfd/elfnn-aarch64.c:5312:2: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘bfd_signed_vma’ [-Werror=format=]
  (_("%pB: error: erratum 843419 immediate 0x%lx "
  ^

(on x86_32)
Comment 7 Tamar Christina 2019-05-21 13:42:46 UTC
Hmm interesting, didn't receive the error when building inside 32-bit schroot on AArch64 machine. I'll take a look, thanks!
Comment 8 Sourceware Commits 2019-05-21 16:17:14 UTC
The master branch has been updated by Tamar Christina <tnfchris@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6467207116c66ff2c58f8bc35cb15b2596f5c457

commit 6467207116c66ff2c58f8bc35cb15b2596f5c457
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Tue May 21 17:15:36 2019 +0100

    AArch64: Fix -Werror on build
    
    This patch fixes a hardcoded `l` specifier on a `bfd_signed_vma`.
    Instead this now uses BFD_VMA_FMT which fixes the build on 32 bit
    hosts.
    
    Committed under the obvious rule.
    
    bfd/ChangeLog:
    
    	PR ld/24373
    	* elfnn-aarch64.c (_bfd_aarch64_erratum_843419_branch_to_stub):
    	Fix print formatter.
Comment 9 Julius Werner 2019-05-21 18:48:15 UTC
Hi Tamar,

Thanks for taking care of this! I think it's a good solution that should be sufficient in practice. The kinds of binaries where an extra 4KB is a big deal should not be larger than 1MB anyway. For our use case, this should work just fine.
Comment 10 Tamar Christina 2019-05-22 10:03:00 UTC
Cheers,

I'll mark this as resolved then. Please re-open if not.