Bug 28355 - [aarch64] regcache.cc:257: A problem internal to GDBserver has been detected. Unknown register tag_ctl requested
Summary: [aarch64] regcache.cc:257: A problem internal to GDBserver has been detected....
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: tdep (show other bugs)
Version: 11.1
: P2 normal
Target Milestone: ---
Assignee: Luis Machado
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-20 13:28 UTC by Tom de Vries
Modified: 2021-11-03 14:46 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
gdb-aarch64-suse-linux.log.gz (2.74 MB, application/gzip)
2021-10-06 12:06 UTC, Tom de Vries
Details
build log (326.15 KB, application/gzip)
2021-10-06 12:07 UTC, Tom de Vries
Details
aarch64-linux-gnu-gdb output capture for archlinuxarm gdbserver 11.1 remote target FAILED (2.66 KB, application/gzip)
2021-10-28 22:04 UTC, sourceware
Details
aarch64-linux-gnu-gdb output capture for petalinux gdbserver 9.2 remote target PASSED (406.42 KB, application/gzip)
2021-10-28 22:06 UTC, sourceware
Details
Tentative patch (3.04 KB, patch)
2021-10-29 18:00 UTC, Luis Machado
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2021-09-20 13:28:03 UTC
With gdb 11.1, I run into:
...
(gdb) target remote localhost:2346^M
Remote debugging using localhost:2346^M
Remote connection closed^M
(gdb) PASS: gdb.server/multi-ui-errors.exp: connect to gdbserver
Remote debugging from host 127.0.0.1, port 36588^M
../../gdbserver/regcache.cc:257: A problem internal to GDBserver has been detected.^M
Unknown register tag_ctl requested^M
ERROR: GDB process no longer exists
GDB process exited with wait status 29567 exp12 0 1
UNRESOLVED: gdb.server/multi-ui-errors.exp: ensure inferior is running
testcase /home/abuild/rpmbuild/BUILD/gdb-11.1/gdb/testsuite/gdb.server/multi-ui-errors.exp
...

Triggers a fair amount of times in the entire test suite:
...
$ grep -a -c "Unknown register tag_ctl requested" binaries-testsuite.openSUSE_Factory_ARM.aarch64/gdb-testresults/*.log
binaries-testsuite.openSUSE_Factory_ARM.aarch64/gdb-testresults/gdb-aarch64-suse-linux.-fno-PIE.-no-pie.log:63
binaries-testsuite.openSUSE_Factory_ARM.aarch64/gdb-testresults/gdb-aarch64-suse-linux.log:63
...
Comment 1 Luis Machado 2021-09-20 14:00:36 UTC
I'm assuming your test environment doesn't support MTE.

I find it odd that find_regno crashes gdbserver. It seems best to not return anything when it isn't found.

PAC has the same code. I'm assuming it might run into the same situation on a machine that doesn't support PAC.
Comment 2 Luis Machado 2021-09-20 14:27:06 UTC
I can't reproduce this, even on a machine without MTE support. Could you please provide more detailed information about the environment in which you ran into this?
Comment 3 Tom de Vries 2021-09-27 10:00:05 UTC
(In reply to Luis Machado from comment #2)
> I can't reproduce this, even on a machine without MTE support. Could you
> please provide more detailed information about the environment in which you
> ran into this?

This triggered in OBS (Open Build Service). 

The log did indicate that this was on a machine without MTE support.

I've tried to reproduce it outside of OBS, but have not managed sofar, unfortunately.
Comment 4 Luis Machado 2021-10-01 11:44:03 UTC
Would it be possible to have more logs from the OBS run?

If the machine doesn't have MTE support (and thus no HWCAP bit set), we shouldn't be generating target descriptions with the tag_ctl register.

I'll keep trying to reproduce this.
Comment 6 Tom de Vries 2021-10-06 12:06:37 UTC
Created attachment 13707 [details]
gdb-aarch64-suse-linux.log.gz
Comment 7 Luis Machado 2021-10-06 12:07:20 UTC
Thanks. I'll take a look.
Comment 8 Tom de Vries 2021-10-06 12:07:22 UTC
Created attachment 13708 [details]
build log
Comment 9 sourceware 2021-10-28 17:05:03 UTC
Me Too!

I'm running gdbserver on archlinuxarm generic aarch64 rootfs:
$ gdbserver --version
GNU gdbserver (GDB) 11.1

And aarch64-linux-gnu-gdb on x86 archlinux host:
$ aarch64-linux-gnu-gdb --version
GNU gdb (GDB) 11.1

The command on aarch64 target:
$ gdbserver :1234 ./hello
Process ./hello created; pid = 1942
Listening on port 1234

The commands on x86 host:
$ aarch64-linux-gnu-gdb hello    <-- ARM executable 'hello' in pwd 
Reading symbols from hello...
(gdb) target remote zcu102-arch:1234
Remote debugging using zcu102-arch:1234
Remote connection closed

This leads to immediate crash on aarch64 target:
Remote debugging from host ::ffff:192.168.1.201, port 44372
../../gdbserver/regcache.cc:257: A problem internal to GDBserver has been detected.
Unknown register tag_ctl requested

If it's meaningful, the aarch64 target does have MTE enabled in kernel:
$ zcat /proc/config.gz | grep -i mte
CONFIG_ARM64_AS_HAS_MTE=y
CONFIG_ARM64_MTE=y

This is 100% reproducible for me. (and 100% impeding my development)

If I may offer patch or other testing, I am happy to do so.

The aarch64 application under debug is built "native" on the ARM target:
$ gcc -g -o hello hello.c

The aarch64 kernel: 
Linux zcu102-arch 5.10.0-xilinx-v2021.1 #1 SMP Fri Jun 4 15:57:16 UTC 2021 aarch64 GNU/Linux

The x86 host kernel:
Linux host 5.14.2-arch1-2 #1 SMP PREEMPT Thu, 09 Sep 2021 09:42:35 +0000 x86_64 GNU/Linux
Comment 10 Luis Machado 2021-10-28 17:10:37 UTC
Hi,

Thanks. It's good to know this is reproducible reliably. Is this system running within QEMU?

I have a fix in mind for this, but was postponing it until I could reproduce this reliably.
Comment 11 Luis Machado 2021-10-28 17:24:39 UTC
Also, when you mentioned the hello ARM executable, you meant AArch64 executable and not a 32-bit ARM executable. Is that correct?
Comment 12 sourceware 2021-10-28 17:31:31 UTC
Thank you Luis!

I'm glad to hear you have a patch in mind.

I'm currently working on actual hardware, a xilinx zcu102 dev board.

I'll try now in QEMU and report results.

And Yes, the 'hello' executable on the x86 host is aarch64.
file command executed on the x86 host:
$ file hello
hello: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=4ee6eadc116d54aff8c7e180f8622bbdfaad4ad6, for GNU/Linux 3.7.0, with debug_info, not stripped

I built this on the aarch64 target, and copied the 'hello' executable to the x86 host for access to symbols.
Comment 13 Luis Machado 2021-10-28 17:59:04 UTC
Thanks. That's great information. Let me try a cross-debugging scenario since I can't reproduce this under a native debugging one with QEMU.
Comment 14 Luis Machado 2021-10-28 19:46:49 UTC
Could you please share what you see with the following command on your target?

LD_SHOW_AUXV=1 /bin/true

Also, could you please share all the output you see when connecting GDB to GDBserver using the following switches?

set debug remote 1
set debug remote-packet-max-chars -1

Please pack the output and attach it to the ticket.

I'm guessing something between detecting MTE and sending the target description is going off the rails. But I haven't managed to reproduce it on my end, neither with a native setup nor with the cross setup.
Comment 15 sourceware 2021-10-28 22:04:46 UTC
Created attachment 13740 [details]
aarch64-linux-gnu-gdb output capture for archlinuxarm gdbserver 11.1 remote target FAILED
Comment 16 sourceware 2021-10-28 22:06:05 UTC
Created attachment 13741 [details]
aarch64-linux-gnu-gdb output capture for petalinux gdbserver 9.2 remote target PASSED
Comment 17 sourceware 2021-10-28 22:09:30 UTC
OK, one slight twist. The failure occurs on an archlinuxarm rootfs, with gdbserver 11.1.

When I boot into Xilinx's petalinux rootfs, remote debug works correctly, with gdbserver 9.2.

In both cases the x86 host runs aarch64-linux-gnu-gdb 11.1.

The data asked for, from both platforms:

From aarch64 petalinux command line:
# LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR:      0xffffaf62c000
AT_??? (0x33): 0x1270
AT_HWCAP:             8fb
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0xaaaaddbb0040
AT_PHENT:             56
AT_PHNUM:             9
AT_BASE:              0xffffaf5fb000
AT_FLAGS:             0x0
AT_ENTRY:             0xaaaaddbb1740
AT_UID:               0
AT_EUID:              0
AT_GID:               0
AT_EGID:              0
AT_SECURE:            0
AT_RANDOM:            0xffffd9b84c38
AT_HWCAP2:            0x0
AT_EXECFN:            /bin/true
AT_PLATFORM:          aarch64


From aaarch64 archlinuxarm command line:
# LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR:      0xffffa2f68000
AT_??? (0x33): 0x1270
AT_HWCAP:             8fb
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0xaaaae9850040
AT_PHENT:             56
AT_PHNUM:             9
AT_BASE:              0xffffa2f37000
AT_FLAGS:             0x0
AT_ENTRY:             0xaaaae98516c8
AT_UID:               0
AT_EUID:              0
AT_GID:               0
AT_EGID:              0
AT_SECURE:            0
AT_RANDOM:            0xffffc9115218
AT_HWCAP2:            0x0
AT_EXECFN:            /bin/true
AT_PLATFORM:          aarch64

The gdb output was captured by setting:
(gdb) set logging debugredirect on
(gdb) set logging on
(gdb) set debug remote 1
(gdb) set debug remote-packet-max-chars -1
(gdb) target remote zcu102:1234

When using the archlinuxarm rootfs, gdbserver 11.1, the (gdb) target remote command led to immediate crash of gdbserver on the target. Data attached as:
gdbserver-11.1-archlinuxarm-failed.txt.gz

When using the petalinux rootfs, gdbserver 9.2, files and libraries were read as expected, and I continued to set breakpoints, single step, and the hello application ran and terminated successfully. Data attached as:
gdbserver-9.2-petalinux-pass.txt.gz

Don't be fooled by the filenames, both of these captures occurred on the same x86 host, running aarch64-linux-gnu-gdb 11.1. The names represent the rootfs and gdbserver version on the remote target.

Don't know if it's related, but I was also unable to get tcf-agent working on the archlinuxarm rootfs, whereas it works on the petalinux rootfs.

I'd actually prefer to run the archlinux aarch64 rootfs, and I hope this data capture can help illuminate the bug, and lead to a fix.

Please let me know if I can capture more data, or test updates...
Comment 18 Luis Machado 2021-10-29 00:34:10 UTC
Thanks for the data.

Some clarifications. I expect any gdbserver that is not 11.1 to work correctly on any of the rootfs'. The failure for gdbserver 11.1 is related to new code to support MTE.

So if you don't need MTE support, you can replace gdbserver 11.1 with another older gdbserver and have a working setup while we address this particular problem.

With that said, I noticed a couple funny things in your AUXV output so far. The HWCAP2 value is 0x0 on both dumps. The presence of MTE is determined by the HWCAP2 values from the AUXV.

Here's what I see in system QEMU, for example:

AT_HWCAP2:       0x75fff

The MTE bit is number 18.

So that might be related to the problem you're seeing.

I'll go through the logs next.
Comment 19 Luis Machado 2021-10-29 02:12:24 UTC
Ok. I finally managed to reproduce this. I'll share more details soon.
Comment 20 sourceware 2021-10-29 06:25:28 UTC
Just confirming that gdbserver 9.2 on archlinux aarch64 rootfs does work correctly, and the problem is reproducible on gdbserver 11.1.

Thanks for that tip!
Comment 21 Luis Machado 2021-10-29 12:47:40 UTC
So, if one has a kernel new enough that it has MTE awareness, and use a new enough gdbserver that knows about the tag_ctl register, it will run into this situation.

The catch being that the Linux kernel doesn't error out when you request the tag_ctl register without MTE support. Some bits are not MTE-specific.

I'm working on a fix for this.
Comment 22 Luis Machado 2021-10-29 18:00:17 UTC
Created attachment 13747 [details]
Tentative patch

I've attached a tentative patch. Could you please try it (it is based on current binutils-gdb-master, but should apply cleanly to gdb-11) and let us know how that works?

I've tried it myself and it fixes the crash for my environment.

If the patch works fine, I'll pursue it upstream.
Comment 23 sourceware 2021-10-29 21:45:45 UTC
OK, took a while to build on the target.

I modified the ArchLinux gdb PKGBUILD file to apply the patch:

patching file gdb/arch/aarch64.h
patching file gdbserver/linux-aarch64-low.cc

After compilation and testing I can confirm that the patch does correct the error.

Looks good here on gdbserver 11.1!
Comment 24 Luis Machado 2021-11-01 13:34:40 UTC
Patch submitted to the mailing list for review. My plan is to fix this on both master and gdb 11 branches.
Comment 26 Sourceware Commits 2021-11-03 13:04:37 UTC
The gdb-11-branch branch has been updated by Luis Machado <luisgpm@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=eb79b2318066cafb75ffdce310e3bbd44f7c79e3

commit eb79b2318066cafb75ffdce310e3bbd44f7c79e3
Author: Luis Machado <luis.machado@linaro.org>
Date:   Fri Oct 29 14:54:36 2021 -0300

    [AArch64] Make gdbserver register set selection dynamic
    
    The current register set selection mechanism for AArch64 is static, based
    on a pre-populated array of register sets.
    
    This means that we might potentially probe register sets that are not
    available. This is OK if the kernel errors out during ptrace, but probing the
    tag_ctl register, for example, does not result in a ptrace error if the kernel
    supports the tagged address ABI but not MTE (PR 28355).
    
    Making the register set selection dynamic, based on feature checks, solves
    this and simplifies the code a bit. It allows us to list all of the register
    sets only once, and pick and choose based on HWCAP/HWCAP2 or other properties.
    
    gdb/ChangeLog:
    
    2021-11-03  Luis Machado  <luis.machado@linaro.org>
    
            PR gdb/28355
    
            * arch/aarch64.h (struct aarch64_features): New struct.
    
    gdbserver/ChangeLog:
    
    2021-11-03  Luis Machado  <luis.machado@linaro.org>
    
            PR gdb/28355
    
            * linux-aarch64-low.cc (is_sve_tdesc): Remove.
            (aarch64_target::low_arch_setup): Rework to adjust the register sets.
            (aarch64_regsets): Update to list all register sets.
            (aarch64_regsets_info, regs_info_aarch64): Replace NULL with nullptr.
            (aarch64_sve_regsets, aarch64_sve_regsets_info)
            (regs_info_aarch64_sve): Remove.
            (aarch64_adjust_register_sets): New.
            (aarch64_target::get_regs_info): Remove references to removed structs.
            (initialize_low_arch): Likewise.
Comment 27 Luis Machado 2021-11-03 13:05:21 UTC
Fix pushed to both binutils-gdb-master and gdb-11.

Thanks for the reproducer!
Comment 28 sourceware 2021-11-03 14:46:57 UTC
Thank you for the quick fix Luis!