Bug 30271 - Addresses of static thread_local fields are badly calculated sometimes
Summary: Addresses of static thread_local fields are badly calculated sometimes
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: exp (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 13.2
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 26909 29366
  Show dependency treegraph
 
Reported: 2023-03-24 18:44 UTC by Michal Chojnowski
Modified: 2023-05-15 15:17 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2023-03-25 00:00:00


Attachments
A patch adding thread_local field handling to value_static_field (possibly in a wrong manner). (634 bytes, patch)
2023-03-24 18:44 UTC, Michal Chojnowski
Details | Diff
test case (1.09 KB, patch)
2023-03-25 16:37 UTC, Tom Tromey
Details | Diff
A minimal reproducer for the bug (1.40 KB, patch)
2023-03-26 00:56 UTC, Michal Chojnowski
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Chojnowski 2023-03-24 18:44:23 UTC
Created attachment 14779 [details]
A patch adding thread_local field handling to value_static_field (possibly in a wrong manner).

Bottom line:

value_static_field() isn't aware of thread_local fields. It just uses the address returned by lookup_minimum_symbol() as-is, and for thread_local variables that's wrong, because in their case the returned value is a TLS section offset, not the address.

Context:

After we upgraded gdb from 12.1 to 13.1, we noticed that some of our gdb scripts stopped working because gdb started to evaluate `&a::b` (where `a::b` is an important `static thread_local` C++ variable) to the TLS offset of `a::b` instead of its runtime address.
We bisected the problem to commit 3d20b8d99a54382e6e1a6c433e71e0775c6856c6 (`Enable the new DWARF indexer`). When observing the execution of GDB before and after this patch, we learned that before it the variable would be successfully looked up via `lookup_symbol_via_quick_fns()`, but after it `lookup_symbol_via_quick_fns()` doesn't succeed in looking the variable up (could this be a problem of its own?) and GDB progresses to `evaluate_expression` -> `value_aggregate_elt` -> `value_static_field` -> `lookup_minimal_symbol`, and this secondary code path gives the wrong result because it just can't handle thread_local variables.

I have attached a patch (which I based on some code found in findvars.c) which fixes our problem. (But I'm clueless about GDB internals, so I can't say anything about the quality of this patch other than that it solves our particular case.)
Comment 1 Tom Tromey 2023-03-25 16:37:40 UTC
Created attachment 14781 [details]
test case

I tried writing a test case to investigate this.
I've attached the test I came up with, based on your
description.  Unfortunately it works for me.
Could you modify the test so that it shows your bug?
Comment 2 Michal Chojnowski 2023-03-26 00:56:39 UTC
Created attachment 14782 [details]
A minimal reproducer for the bug

Fails only when the test is compiled with clang.
Comment 3 Michal Chojnowski 2023-03-26 01:21:20 UTC
I managed to boil it down to a minimal example. I've added the test case as an attachment.

As it turns out, the problem happens because `cooked_index_functions::expand_symtabs_matching` fails to expand the symtabs which contain the definitions of the static fields, because cooked index entries corresponding to static fields don't have parent_entry set. (For this part of the bug, thread_local is not relevant — this happens for any static field, but only thread_local messes up the fallback code path which happens after the lookup fails).

This only happens when the test is compiled with clang, though. (I tested that with clang++ 15.0.7). It doesn't seem to happen with gcc. (I tested that with g++ 12.2.1).

If the relevant symtab is expanded beforehand, e.g. by another lookup, then the lookup of the static field won't fail. This makes the bug quite hard to hit. In many cases, the relevant symtab is expanded by the lookup of the class which occurs immediately before the lookup of the field. That's why two files are needed in the test — the class lookup expands one symtab, and so the lookup of one field doesn't fail, but it doesn't expand the other symtab and the lookup of the other field fails.

Note that this problem with symtabs is orthogonal to the problem with `value_static_field` which I described in earlier comments. If this turns out to be a clang bug, `value_static_field` may still deserve a fix.
Comment 4 Tom Tromey 2023-03-31 17:12:30 UTC
Kind of weird DWARF here.  The only location for this
is given by:

 <1><82>: Abbrev Number: 2 (DW_TAG_variable)
    <83>   DW_AT_specification: <0x9f>
    <87>   DW_AT_location    : 10 byte block: e 0 0 0 0 0 0 0 0 e0 	(DW_OP_const8u: 0 0; DW_OP_GNU_push_tls_address or DW_OP_HP_unknown)
    <92>   DW_AT_linkage_name: (indirect string, offset: 0x156): _ZN9container8tlsvar_0E

It has a forward reference to its defining scope...

gdb currently deals with *backward* references like this by tracking
the DWARF scopes for DIE ranges.  This is needed because gdb's fast
scanner is "forward only", and DWARF provides no way to move up the DIE
tree.

Maybe we can also defer the processing of forward-referencing specifications.

Gosh I hate DWARF.  This stuff is all so needless -- it makes reader
implementation very difficult and provides zero benefit whatsoever.
Comment 5 Tom Tromey 2023-03-31 17:54:08 UTC
Actually it's kind of weird that the existing defer mechanism
doesn't work here.
Comment 6 Tom Tromey 2023-03-31 18:00:12 UTC
I found out that deferred_entry::spec_offset is set but not used... ugh.
Changing the defer code to use this makes the test work.
So I guess maybe the deferral stuff was all working by accident somehow?
What a mess.
Comment 7 Tom Tromey 2023-03-31 18:03:01 UTC
Thank you, btw, for updating the test case.  That was very helpful.

This patch works but I think I want to rewrite the test to use the
DWARF assembler, so it's compiler-independent.

diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 8f35b973f3e..609fc44190d 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -16577,9 +16577,8 @@ cooked_indexer::make_index (cutu_reader *reader)
 
   for (const auto &entry : m_deferred_entries)
     {
-      CORE_ADDR key = form_addr (entry.die_offset, m_per_cu->is_dwz);
-      void *obj = m_die_range_map.find (key);
-      cooked_index_entry *parent = static_cast <cooked_index_entry *> (obj);
+      void *obj = m_die_range_map.find (entry.spec_offset);
+      cooked_index_entry *parent = static_cast<cooked_index_entry *> (obj);
       m_index_storage->add (entry.die_offset, entry.tag, entry.flags,
 			    entry.name, parent, m_per_cu);
     }
Comment 8 Tom Tromey 2023-05-10 23:28:03 UTC
I think this is a decent candidate for 13.2
Comment 9 Sourceware Commits 2023-05-15 14:51:48 UTC
The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b10f2cd3f3c3b25c71e50a342fb46f9eb9eba792

commit b10f2cd3f3c3b25c71e50a342fb46f9eb9eba792
Author: Tom Tromey <tom@tromey.com>
Date:   Fri Apr 28 13:04:15 2023 -0600

    Correctly handle forward DIE references in scanner
    
    The cooked index scanner has special code to handle forward DIE
    references.  However, a bug report lead to the discovery that this
    code does not work -- the "deferred_entry::spec_offset" field is
    written to but never used, i.e., the lookup is done using the wrong
    key.
    
    This patch fixes the bug and adds a regression test.
    
    The test in the bug itself used a thread_local variable, which
    provoked a failure at runtime.  This test instead uses "maint print
    objfiles" and then inspects to ensure that the entry in question has a
    parent.  This lets us avoid a clang dependency in the test.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30271
Comment 10 Michal Chojnowski 2023-05-15 15:06:07 UTC
Thanks for the fix!
Comment 11 Sourceware Commits 2023-05-15 15:16:36 UTC
The gdb-13-branch branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=49664141e1446e5f403a716bf8e50375842934b1

commit 49664141e1446e5f403a716bf8e50375842934b1
Author: Tom Tromey <tom@tromey.com>
Date:   Fri Apr 28 13:04:15 2023 -0600

    Correctly handle forward DIE references in scanner
    
    The cooked index scanner has special code to handle forward DIE
    references.  However, a bug report lead to the discovery that this
    code does not work -- the "deferred_entry::spec_offset" field is
    written to but never used, i.e., the lookup is done using the wrong
    key.
    
    This patch fixes the bug and adds a regression test.
    
    The test in the bug itself used a thread_local variable, which
    provoked a failure at runtime.  This test instead uses "maint print
    objfiles" and then inspects to ensure that the entry in question has a
    parent.  This lets us avoid a clang dependency in the test.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30271
    
    (cherry picked from commit b10f2cd3f3c3b25c71e50a342fb46f9eb9eba792)
Comment 12 Tom Tromey 2023-05-15 15:17:04 UTC
Fixed.