With gdb 11.1, I run into:
(gdb) target remote localhost:2346^M
Remote debugging using localhost:2346^M
Remote connection closed^M
(gdb) PASS: gdb.server/multi-ui-errors.exp: connect to gdbserver
Remote debugging from host 127.0.0.1, port 36588^M
../../gdbserver/regcache.cc:257: A problem internal to GDBserver has been detected.^M
Unknown register tag_ctl requested^M
ERROR: GDB process no longer exists
GDB process exited with wait status 29567 exp12 0 1
UNRESOLVED: gdb.server/multi-ui-errors.exp: ensure inferior is running
Triggers a fair amount of times in the entire test suite:
$ grep -a -c "Unknown register tag_ctl requested" binaries-testsuite.openSUSE_Factory_ARM.aarch64/gdb-testresults/*.log
I'm assuming your test environment doesn't support MTE.
I find it odd that find_regno crashes gdbserver. It seems best to not return anything when it isn't found.
PAC has the same code. I'm assuming it might run into the same situation on a machine that doesn't support PAC.
I can't reproduce this, even on a machine without MTE support. Could you please provide more detailed information about the environment in which you ran into this?
(In reply to Luis Machado from comment #2)
> I can't reproduce this, even on a machine without MTE support. Could you
> please provide more detailed information about the environment in which you
> ran into this?
This triggered in OBS (Open Build Service).
The log did indicate that this was on a machine without MTE support.
I've tried to reproduce it outside of OBS, but have not managed sofar, unfortunately.
Would it be possible to have more logs from the OBS run?
If the machine doesn't have MTE support (and thus no HWCAP bit set), we shouldn't be generating target descriptions with the tag_ctl register.
I'll keep trying to reproduce this.
You may have to create an account on obs to be able to access this, I'm not sure.
Created attachment 13707 [details]
Thanks. I'll take a look.
Created attachment 13708 [details]
I'm running gdbserver on archlinuxarm generic aarch64 rootfs:
$ gdbserver --version
GNU gdbserver (GDB) 11.1
And aarch64-linux-gnu-gdb on x86 archlinux host:
$ aarch64-linux-gnu-gdb --version
GNU gdb (GDB) 11.1
The command on aarch64 target:
$ gdbserver :1234 ./hello
Process ./hello created; pid = 1942
Listening on port 1234
The commands on x86 host:
$ aarch64-linux-gnu-gdb hello <-- ARM executable 'hello' in pwd
Reading symbols from hello...
(gdb) target remote zcu102-arch:1234
Remote debugging using zcu102-arch:1234
Remote connection closed
This leads to immediate crash on aarch64 target:
Remote debugging from host ::ffff:192.168.1.201, port 44372
../../gdbserver/regcache.cc:257: A problem internal to GDBserver has been detected.
Unknown register tag_ctl requested
If it's meaningful, the aarch64 target does have MTE enabled in kernel:
$ zcat /proc/config.gz | grep -i mte
This is 100% reproducible for me. (and 100% impeding my development)
If I may offer patch or other testing, I am happy to do so.
The aarch64 application under debug is built "native" on the ARM target:
$ gcc -g -o hello hello.c
The aarch64 kernel:
Linux zcu102-arch 5.10.0-xilinx-v2021.1 #1 SMP Fri Jun 4 15:57:16 UTC 2021 aarch64 GNU/Linux
The x86 host kernel:
Linux host 5.14.2-arch1-2 #1 SMP PREEMPT Thu, 09 Sep 2021 09:42:35 +0000 x86_64 GNU/Linux
Thanks. It's good to know this is reproducible reliably. Is this system running within QEMU?
I have a fix in mind for this, but was postponing it until I could reproduce this reliably.
Also, when you mentioned the hello ARM executable, you meant AArch64 executable and not a 32-bit ARM executable. Is that correct?
Thank you Luis!
I'm glad to hear you have a patch in mind.
I'm currently working on actual hardware, a xilinx zcu102 dev board.
I'll try now in QEMU and report results.
And Yes, the 'hello' executable on the x86 host is aarch64.
file command executed on the x86 host:
$ file hello
hello: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=4ee6eadc116d54aff8c7e180f8622bbdfaad4ad6, for GNU/Linux 3.7.0, with debug_info, not stripped
I built this on the aarch64 target, and copied the 'hello' executable to the x86 host for access to symbols.
Thanks. That's great information. Let me try a cross-debugging scenario since I can't reproduce this under a native debugging one with QEMU.
Could you please share what you see with the following command on your target?
Also, could you please share all the output you see when connecting GDB to GDBserver using the following switches?
set debug remote 1
set debug remote-packet-max-chars -1
Please pack the output and attach it to the ticket.
I'm guessing something between detecting MTE and sending the target description is going off the rails. But I haven't managed to reproduce it on my end, neither with a native setup nor with the cross setup.
Created attachment 13740 [details]
aarch64-linux-gnu-gdb output capture for archlinuxarm gdbserver 11.1 remote target FAILED
Created attachment 13741 [details]
aarch64-linux-gnu-gdb output capture for petalinux gdbserver 9.2 remote target PASSED
OK, one slight twist. The failure occurs on an archlinuxarm rootfs, with gdbserver 11.1.
When I boot into Xilinx's petalinux rootfs, remote debug works correctly, with gdbserver 9.2.
In both cases the x86 host runs aarch64-linux-gnu-gdb 11.1.
The data asked for, from both platforms:
From aarch64 petalinux command line:
# LD_SHOW_AUXV=1 /bin/true
AT_??? (0x33): 0x1270
From aaarch64 archlinuxarm command line:
# LD_SHOW_AUXV=1 /bin/true
AT_??? (0x33): 0x1270
The gdb output was captured by setting:
(gdb) set logging debugredirect on
(gdb) set logging on
(gdb) set debug remote 1
(gdb) set debug remote-packet-max-chars -1
(gdb) target remote zcu102:1234
When using the archlinuxarm rootfs, gdbserver 11.1, the (gdb) target remote command led to immediate crash of gdbserver on the target. Data attached as:
When using the petalinux rootfs, gdbserver 9.2, files and libraries were read as expected, and I continued to set breakpoints, single step, and the hello application ran and terminated successfully. Data attached as:
Don't be fooled by the filenames, both of these captures occurred on the same x86 host, running aarch64-linux-gnu-gdb 11.1. The names represent the rootfs and gdbserver version on the remote target.
Don't know if it's related, but I was also unable to get tcf-agent working on the archlinuxarm rootfs, whereas it works on the petalinux rootfs.
I'd actually prefer to run the archlinux aarch64 rootfs, and I hope this data capture can help illuminate the bug, and lead to a fix.
Please let me know if I can capture more data, or test updates...
Thanks for the data.
Some clarifications. I expect any gdbserver that is not 11.1 to work correctly on any of the rootfs'. The failure for gdbserver 11.1 is related to new code to support MTE.
So if you don't need MTE support, you can replace gdbserver 11.1 with another older gdbserver and have a working setup while we address this particular problem.
With that said, I noticed a couple funny things in your AUXV output so far. The HWCAP2 value is 0x0 on both dumps. The presence of MTE is determined by the HWCAP2 values from the AUXV.
Here's what I see in system QEMU, for example:
The MTE bit is number 18.
So that might be related to the problem you're seeing.
I'll go through the logs next.
Ok. I finally managed to reproduce this. I'll share more details soon.
Just confirming that gdbserver 9.2 on archlinux aarch64 rootfs does work correctly, and the problem is reproducible on gdbserver 11.1.
Thanks for that tip!
So, if one has a kernel new enough that it has MTE awareness, and use a new enough gdbserver that knows about the tag_ctl register, it will run into this situation.
The catch being that the Linux kernel doesn't error out when you request the tag_ctl register without MTE support. Some bits are not MTE-specific.
I'm working on a fix for this.
Created attachment 13747 [details]
I've attached a tentative patch. Could you please try it (it is based on current binutils-gdb-master, but should apply cleanly to gdb-11) and let us know how that works?
I've tried it myself and it fixes the crash for my environment.
If the patch works fine, I'll pursue it upstream.
OK, took a while to build on the target.
I modified the ArchLinux gdb PKGBUILD file to apply the patch:
patching file gdb/arch/aarch64.h
patching file gdbserver/linux-aarch64-low.cc
After compilation and testing I can confirm that the patch does correct the error.
Looks good here on gdbserver 11.1!
Patch submitted to the mailing list for review. My plan is to fix this on both master and gdb 11 branches.
The gdb-11-branch branch has been updated by Luis Machado <firstname.lastname@example.org>:
Author: Luis Machado <email@example.com>
Date: Fri Oct 29 14:54:36 2021 -0300
[AArch64] Make gdbserver register set selection dynamic
The current register set selection mechanism for AArch64 is static, based
on a pre-populated array of register sets.
This means that we might potentially probe register sets that are not
available. This is OK if the kernel errors out during ptrace, but probing the
tag_ctl register, for example, does not result in a ptrace error if the kernel
supports the tagged address ABI but not MTE (PR 28355).
Making the register set selection dynamic, based on feature checks, solves
this and simplifies the code a bit. It allows us to list all of the register
sets only once, and pick and choose based on HWCAP/HWCAP2 or other properties.
2021-11-03 Luis Machado <firstname.lastname@example.org>
* arch/aarch64.h (struct aarch64_features): New struct.
2021-11-03 Luis Machado <email@example.com>
* linux-aarch64-low.cc (is_sve_tdesc): Remove.
(aarch64_target::low_arch_setup): Rework to adjust the register sets.
(aarch64_regsets): Update to list all register sets.
(aarch64_regsets_info, regs_info_aarch64): Replace NULL with nullptr.
(aarch64_target::get_regs_info): Remove references to removed structs.
Fix pushed to both binutils-gdb-master and gdb-11.
Thanks for the reproducer!
Thank you for the quick fix Luis!