Bug 31179

Summary: RISC-V: The SET/ADD/SUB fix breaks ABI compatibility with 2.41 objects
Product: binutils Reporter: Palmer Dabbelt <palmer>
Component: ldAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: charlie, david.abdurachmanov, nelsonc1225, rjones, sam
Priority: P2    
Version: 2.42   
Target Milestone: ---   
See Also: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/414
Host: Target:
Build: Last reconfirmed:
Attachments: proposed solution v1
proposed solution with the tag to keep compatible

Description Palmer Dabbelt 2023-12-18 16:59:42 UTC
Nelson, David, and Aurelian have been poking this.  There's no bug and everyone's in different time zones, though, so I'm just starting with something -- it's a bit rushed, but I wanted to get a big filed so we at least had a place to talk.

2029e13917d ("RISC-V: Clarify the behaviors of SET/ADD/SUB relocations.") fixes two bugs.  This was released in 2.41 and those bugs counteract each other, so an LD with the fix doesn't work with binaries produced with a GAS that doesn't have the fix.  So this is a defacto user-visible ABI break.

We can tell everyone "just rebuild the old binaries", but since the bug is in a released version that's a headache for the distros.  The best I can come up with is to add some tag to the new binaries saying "these don't have the bug", and then figure it out from there -- that'd have some implications on the LLVM side of things, I'll poke some folks over there to see what's up.

Unless anyone has a better idea?
Comment 1 Palmer Dabbelt 2023-12-18 19:18:38 UTC
I opened <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/414> for the psABI, but happy to hear if anyone has a better idea -- trying to keep around the bug tag is going to be a headache.
Comment 2 David Abdurachmanov 2023-12-18 19:38:45 UTC
To add some context here. Debian is using pre-2.42 bintuils, and GCC 13 failed to build:

[..]
riscv64-linux-gnu-gdc-12 -no-pie -lstdc++   -g -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings   -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc  -static-libphobos -o d21 \
	d/access.o d/aggregate.o d/aliasthis.o d/apply.o d/arrayop.o d/arraytypes.o d/attrib.o d/ast_node.o d/astcodegen.o d/astenums.o d/blockexit.o d/builtin.o d/canthrow.o d/chkformat.o d/clone.o d/common-bitfields.o d/common-file.o d/common-outbuffer.o d/common-string.o d/compiler.o d/cond.o d/constfold.o d/cparse.o d/cppmangle.o d/ctfeexpr.o d/ctorflow.o d/dcast.o d/dclass.o d/declaration.o d/delegatize.o d/denum.o d/dimport.o d/dinterpret.o d/dmacro.o d/dmangle.o d/dmodule.o d/doc.o d/dscope.o d/dstruct.o d/dsymbol.o d/dsymbolsem.o d/dtemplate.o d/dtoh.o d/dversion.o d/entity.o d/errors.o d/errorsink.o d/escape.o d/expression.o d/expressionsem.o d/file_manager.o d/foreachvar.o d/func.o d/globals.o d/gluelayer.o d/hdrgen.o d/iasm.o d/iasmgcc.o d/id.o d/identifier.o d/impcnvtab.o d/imphint.o d/importc.o d/init.o d/initsem.o d/inline.o d/intrange.o d/json.o d/lambdacomp.o d/lexer.o d/location.o d/mtype.o d/mustuse.o d/nogc.o d/nspace.o d/ob.o d/objc.o d/opover.o d/optimize.o d/parse.o d/parsetimevisitor.o d/permissivevisitor.o d/printast.o d/root-aav.o d/root-array.o d/root-bitarray.o d/root-complex.o d/root-ctfloat.o d/root-file.o d/root-filename.o d/root-hash.o d/root-longdouble.o d/root-optional.o d/root-port.o d/root-region.o d/root-rmem.o d/root-rootobject.o d/root-speller.o d/root-string.o d/root-stringtable.o d/root-utf.o d/safe.o d/sapply.o d/semantic2.o d/semantic3.o d/sideeffect.o d/statement.o d/statement_rewrite_walker.o d/statementsem.o d/staticassert.o d/staticcond.o d/stmtstate.o d/target.o d/templateparamsem.o d/tokens.o d/traits.o d/transitivevisitor.o d/typesem.o d/typinf.o d/utils.o d/visitor.o d/d-attribs.o d/d-builtins.o d/d-codegen.o d/d-compiler.o d/d-convert.o d/d-ctfloat.o d/d-diagnostic.o d/d-frontend.o d/d-gimplify.o d/d-incpath.o d/d-lang.o d/d-longdouble.o d/d-port.o d/d-target.o d/decl.o d/expr.o d/imports.o d/intrinsics.o d/modules.o d/runtime.o d/toir.o d/typeinfo.o d/types.o riscv-d.o linux-d.o attribs.o libbackend.a main.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a  -lisl -lmpc -lmpfr -lgmp -rdynamic  -lz -lzstd 
/usr/bin/ld: final size of uleb128 value at offset 0x1491 in .debug_loclists from /usr/lib/gcc/riscv64-linux-gnu/12/libgphobos.a(aaA.o) exceeds available space
/usr/lib/gcc/riscv64-linux-gnu/12/libgphobos.a(aaA.o): in function `.LLST1':
aaA.d:(.debug_loclists+0x1491): dangerous relocation: dangerous relocation error
/usr/bin/ld: final size of uleb128 value at offset 0xe92 in .debug_loclists from /usr/lib/gcc/riscv64-linux-gnu/12/libgphobos.a(deh.o) exceeds available space
/usr/lib/gcc/riscv64-linux-gnu/12/libgphobos.a(deh.o): in function `.LLST1':
deh.d:(.debug_loclists+0xe92): dangerous relocation: dangerous relocation error
collect2: error: ld returned 1 exit status
make[5]: *** [../../src/gcc/d/Make-lang.in:236: d21] Error 1
make[5]: *** Waiting for unfinished jobs....
[..]
Comment 3 Palmer Dabbelt 2023-12-19 15:14:00 UTC
(In reply to David Abdurachmanov from comment #2)
> To add some context here. Debian is using pre-2.42 bintuils, and GCC 13
> failed to build:
> 
> [..]
> riscv64-linux-gnu-gdc-12 -no-pie -lstdc++   -g -DIN_GCC     -fno-exceptions
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing
> -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute
> -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long
> -Wno-variadic-macros -Wno-overlength-strings   -DHAVE_CONFIG_H
> -static-libstdc++ -static-libgcc  -static-libphobos -o d21 \
> 	d/access.o d/aggregate.o d/aliasthis.o d/apply.o d/arrayop.o d/arraytypes.o
> d/attrib.o d/ast_node.o d/astcodegen.o d/astenums.o d/blockexit.o
> d/builtin.o d/canthrow.o d/chkformat.o d/clone.o d/common-bitfields.o
> d/common-file.o d/common-outbuffer.o d/common-string.o d/compiler.o d/cond.o
> d/constfold.o d/cparse.o d/cppmangle.o d/ctfeexpr.o d/ctorflow.o d/dcast.o
> d/dclass.o d/declaration.o d/delegatize.o d/denum.o d/dimport.o
> d/dinterpret.o d/dmacro.o d/dmangle.o d/dmodule.o d/doc.o d/dscope.o
> d/dstruct.o d/dsymbol.o d/dsymbolsem.o d/dtemplate.o d/dtoh.o d/dversion.o
> d/entity.o d/errors.o d/errorsink.o d/escape.o d/expression.o
> d/expressionsem.o d/file_manager.o d/foreachvar.o d/func.o d/globals.o
> d/gluelayer.o d/hdrgen.o d/iasm.o d/iasmgcc.o d/id.o d/identifier.o
> d/impcnvtab.o d/imphint.o d/importc.o d/init.o d/initsem.o d/inline.o
> d/intrange.o d/json.o d/lambdacomp.o d/lexer.o d/location.o d/mtype.o
> d/mustuse.o d/nogc.o d/nspace.o d/ob.o d/objc.o d/opover.o d/optimize.o
> d/parse.o d/parsetimevisitor.o d/permissivevisitor.o d/printast.o
> d/root-aav.o d/root-array.o d/root-bitarray.o d/root-complex.o
> d/root-ctfloat.o d/root-file.o d/root-filename.o d/root-hash.o
> d/root-longdouble.o d/root-optional.o d/root-port.o d/root-region.o
> d/root-rmem.o d/root-rootobject.o d/root-speller.o d/root-string.o
> d/root-stringtable.o d/root-utf.o d/safe.o d/sapply.o d/semantic2.o
> d/semantic3.o d/sideeffect.o d/statement.o d/statement_rewrite_walker.o
> d/statementsem.o d/staticassert.o d/staticcond.o d/stmtstate.o d/target.o
> d/templateparamsem.o d/tokens.o d/traits.o d/transitivevisitor.o d/typesem.o
> d/typinf.o d/utils.o d/visitor.o d/d-attribs.o d/d-builtins.o d/d-codegen.o
> d/d-compiler.o d/d-convert.o d/d-ctfloat.o d/d-diagnostic.o d/d-frontend.o
> d/d-gimplify.o d/d-incpath.o d/d-lang.o d/d-longdouble.o d/d-port.o
> d/d-target.o d/decl.o d/expr.o d/imports.o d/intrinsics.o d/modules.o
> d/runtime.o d/toir.o d/typeinfo.o d/types.o riscv-d.o linux-d.o attribs.o
> libbackend.a main.o libcommon-target.a libcommon.a ../libcpp/libcpp.a
> ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a  
> ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a
> ../libdecnumber/libdecnumber.a  -lisl -lmpc -lmpfr -lgmp -rdynamic  -lz
> -lzstd 
> /usr/bin/ld: final size of uleb128 value at offset 0x1491 in .debug_loclists
> from /usr/lib/gcc/riscv64-linux-gnu/12/libgphobos.a(aaA.o) exceeds available
> space
> /usr/lib/gcc/riscv64-linux-gnu/12/libgphobos.a(aaA.o): in function `.LLST1':
> aaA.d:(.debug_loclists+0x1491): dangerous relocation: dangerous relocation
> error
> /usr/bin/ld: final size of uleb128 value at offset 0xe92 in .debug_loclists
> from /usr/lib/gcc/riscv64-linux-gnu/12/libgphobos.a(deh.o) exceeds available
> space
> /usr/lib/gcc/riscv64-linux-gnu/12/libgphobos.a(deh.o): in function `.LLST1':
> deh.d:(.debug_loclists+0xe92): dangerous relocation: dangerous relocation
> error
> collect2: error: ld returned 1 exit status
> make[5]: *** [../../src/gcc/d/Make-lang.in:236: d21] Error 1
> make[5]: *** Waiting for unfinished jobs....
> [..]

Can you point me at that libgphobos.a?  I'd like to poke around in objdump...
Comment 4 Palmer Dabbelt 2023-12-19 15:28:15 UTC
(In reply to Palmer Dabbelt from comment #0)
> 2029e13917d ("RISC-V: Clarify the behaviors of SET/ADD/SUB relocations.")

I guess also a question for the distro folks, but are we sure this commit is actually what's breaking things?  Nelson said that patch only changes the behavior of .reloc, but as far as I can tell it also changes .uleb128.  Maybe I'm just missing something?

It reverts cleanly, so if someone still has a setup that's failing then I think it should be easy to check.
Comment 5 Andreas Schwab 2023-12-19 16:01:50 UTC
There are a lot of SUB_ULEB128 relocations with a non-zero addend in installed static archives that will change behaviour with this commit.

$ readelf -Wr /usr/lib64/libc.a | grep -c 'SUB_ULEB128.*[+-] [^0]'
17043
$ readelf -Wr /usr/lib64/gcc/riscv64-suse-linux/13/libgphobos.a | grep -c 'SUB_ULEB128.*[+-] [^0]'
55318
Comment 6 Palmer Dabbelt 2023-12-20 00:11:57 UTC
(In reply to Andreas Schwab from comment #5)
> There are a lot of SUB_ULEB128 relocations with a non-zero addend in
> installed static archives that will change behaviour with this commit.
> 
> $ readelf -Wr /usr/lib64/libc.a | grep -c 'SUB_ULEB128.*[+-] [^0]'
> 17043
> $ readelf -Wr /usr/lib64/gcc/riscv64-suse-linux/13/libgphobos.a | grep -c
> 'SUB_ULEB128.*[+-] [^0]'
> 55318

Awesome, thanks.  Nelson and I were getting some confusing error reports, but this matches what I was trying to fix at the beginning.  We might have another bug, but I think this one we can try to fix.

So I think we can do something like the following:
* Check for binaries with the bug -- I think we want a tag here, but I guess we could also try and pattern match the +N/+N vs +M/+0.
* If the binary has the bug, then revert to the old linker behavior for calculating the addresses from the input.
* Always emit correct addends into the output.

Then we can add some some warning when we detect the old inputs, along with some commandline/autoconf arguments to turn the warnings into errors.
Comment 7 Nelson Chu 2023-12-20 03:36:46 UTC
Created attachment 15267 [details]
proposed solution v1

I'm running the regressions of riscv-gnu-toolchain, so will send the patch if everything looks well.  Basically the root cause of the problem is same as what Andreas mentioned, and the proposed solution is also same as what Palmer said, so just make sure that what the proposed patch plan to do in the short-term,

For 2.42 assembler,
- Keep generating the right zero addend of SUB_ULEB128 relocation.

For 2.42 linker,
- Ignore the addend of SUB_ULEB128 relocation when relocating since it should always be zero by using .ueb128 directive.
- Add ld target option, --[no-]check-uleb128, to enable/disable checking if SUB_ULEB128 with non-zero addend.  If people get warnings by enabling the option, then,
1. they should plan to rebuild their stuff to get the right behavior in assembler.
2. they probably are using .reloc *SUB* with non-zero addend, then they will get troubles temporarily.  Since the .reloc usage is rarely to use in RISC-V, so there is much room for discussion in terms of use.  That means maybe we should just encourage people don't use that, or add some limitation for it.
Comment 8 Nelson Chu 2023-12-22 05:52:04 UTC
Created attachment 15270 [details]
proposed solution with the tag to keep compatible

Updated to have a tag, this patch should be applied after the proposed v1 patch.

The object which marked by the tag means it won't have any non-zero ddend in SUB_ULEB128 from .uleb128 directives, so we don't need to ignore them to get the
correct calculation in linker.

Since the tag is non-standard in riscv psabi, so need use elf_other_obj_attributes_proc rather than elf_known_obj_attributes_proc api to access.
Comment 9 Palmer Dabbelt 2023-12-22 16:25:36 UTC
(In reply to Nelson Chu from comment #8)
> Created attachment 15270 [details]
> proposed solution with the tag to keep compatible
> 
> Updated to have a tag, this patch should be applied after the proposed v1
> patch.
> 
> The object which marked by the tag means it won't have any non-zero ddend in
> SUB_ULEB128 from .uleb128 directives, so we don't need to ignore them to get
> the
> correct calculation in linker.
> 
> Since the tag is non-standard in riscv psabi, so need use
> elf_other_obj_attributes_proc rather than elf_known_obj_attributes_proc api
> to access.

Thanks Nelson, that was super fast ;)

So I think this is a roughly workable solution (maybe we should cache those tag lookups for performance, not sure if it matters).  It's going to be hard to tell for sure without some distro testing, though.

Does this pass the toolchain regressions?  If so, I think we should poke the distro folks (many of whom I guess who are here) to see if it's actually resolving their issues.
Comment 10 Charlie Jenkins 2023-12-22 21:25:06 UTC
>From a9eb35c3ee0eb265674d7858a8b4ec24928eae6e Mon Sep 17 00:00:00 2001
>From: Nelson Chu <nelson@rivosinc.com>
>Date: Wed, 20 Dec 2023 10:37:41 +0800
>Subject: [PATCH] RISC-V: PR31179, The SET/ADD/SUB fix breaks ABI compatibility
> with 2.41 objects
>
>* Problematic fix commit,
>2029e13917d53d2289d3ebb390c4f40bd2112d21
>RISC-V: Clarify the behaviors of SET/ADD/SUB relocations

...

>
>+		_bfd_error_handler (_("%pB: warning: R_RISCV_SUB_ULEB128 with"
>+				      " non-zero addend, please rebuild by new"
>+				      " binutils"), input_bfd);

Is it possible to provide a version number instead of saying "new"? Then you could say something like "please rebuild using a binutils that is at least version 2.42".

...

> PARSE_AND_LIST_OPTIONS=${PARSE_AND_LIST_OPTIONS}'
>   fprintf (file, _("  --relax-gp                  Perform GP relaxation\n"));
>   fprintf (file, _("  --no-relax-gp               Don'\''t perform GP relaxation\n"));
>+  fprintf (file, _("  --check-uleb128             Check if SUB_ULEB128 with non-zero addend\n"));

I think it is more clear here to replace "with" with "has".

>+  fprintf (file, _("  --no-check-uleb128          Don'\''t check if SUB_ULEB128 with non-zero addend\n"));

Same thing here.
...
Comment 11 Nelson Chu 2023-12-25 09:03:06 UTC
> So I think this is a roughly workable solution (maybe we should cache those
> tag lookups for performance, not sure if it matters).  It's going to be hard
> to tell for sure without some distro testing, though.
> 
> Does this pass the toolchain regressions?  If so, I think we should poke the
> distro folks (many of whom I guess who are here) to see if it's actually
> resolving their issues.

I get segment fault since forgot to define obj_attrs_handle_unknown.  But that causes another problem for the tag - when we archiving multiple objects into one by -r, then how to deal with that if some of them with the tag, and others without the tag?
Comment 12 Nelson Chu 2024-01-05 02:08:20 UTC
commit 73d931e560059a87d76f528fafbb4270a98746bc
Refs: gdb-14-branchpoint-932-g73d931e5600
Author:     Nelson Chu <nelson@rivosinc.com>
AuthorDate: Wed Dec 20 10:37:41 2023 +0800
Commit:     Nelson Chu <nelson@rivosinc.com>
CommitDate: Thu Dec 28 14:51:50 2023 +0800

    RISC-V: PR31179, The SET/ADD/SUB fix breaks ABI compatibility with 2.41 objects

    * Problematic fix commit,
    2029e13917d53d2289d3ebb390c4f40bd2112d21
    RISC-V: Clarify the behaviors of SET/ADD/SUB relocations

    * Bugzilla,
    https://sourceware.org/bugzilla/show_bug.cgi?id=31179#c5

    The addend of SUB_ULEB128 should be zero if using .uleb128, but we make it
    non-zero by accident in assembler before.  This causes troubles by applying
    the above commit, since the calculation is changed to support .reloc *SUB*
    relocations with non-zero addend.

    We encourage people to rebuild their stuff to get the non-zero addend of
    SUB_ULEB128, but that might need some times, so report warnings to inform
    people need to rebuild their stuff if --check-uleb128 is enabled.

    Since the failed .reloc cases for ADD/SET/SUB/ULEB128 are rarely to use,
    it may acceptable that stop supproting them until people rebuld their stuff,
    maybe half-year or a year later.  Or maybe we should teach people that don't
    write the .reloc R_RISCV_SUB* with non-zero constant, and then report
    warnings/errors in assembler.

    bfd/
            * elfnn-riscv.c (perform_relocation): Ignore the non-zero addend of
            R_RISCV_SUB_ULEB128.
            (riscv_elf_relocate_section): Report warnings to inform people need
            to rebuild their stuff if --check-uleb128 is enabled.  So that can
            get the right non-zero addend of R_RISCV_SUB_ULEB128.
            * elfxx-riscv.h (struct riscv_elf_params): Added bool check_uleb128.
    ld/
            * NEWS: Updated.
            * emultempl/riscvelf.em: Added linker risc-v target options,
            --[no-]check-uleb128, to enable/disable checking if the addend of
            uleb128 is non-zero or not.  So that people will know they need to
            rebuild the objects with binutils 2.42 and up, to get the right zero
            addend of SUB_ULEB128 relocation, or they may get troubles if using
            .reloc.
            * ld/testsuite/ld-riscv-elf/ld-riscv-elf.exp: Updated.
            * ld/testsuite/ld-riscv-elf/pr31179*: New test cases.