[PATCH] Introduce struct packed template, fix -fsanitize=thread for per_cu fields
Tom de Vries
tdevries@suse.de
Thu Jul 7 10:18:14 GMT 2022
On 7/6/22 21:20, Pedro Alves wrote:
> On 2022-07-04 8:45 p.m., Tom de Vries via Gdb-patches wrote:
>> On 7/4/22 20:32, Tom Tromey wrote:
>>>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
>>>
>>> Tom> /* The number of bits needed to represent all languages, with enough
>>> Tom> padding to allow for reasonable growth. */
>>> Tom> -#define LANGUAGE_BITS 5
>>> Tom> +#define LANGUAGE_BITS 8
>>>
>>> This will negatively affect the size of symbols and so I think it should
>>> be avoided.
>>>
>>
>> Ack, Pedro suggested a way to avoid this:
>> ...
>> + struct {
>> + /* The language of this CU. */
>> + ENUM_BITFIELD (language) m_lang : LANGUAGE_BITS;
>> + };
>> ...
>>
>
> It actually doesn't avoid it in this case,
We were merely discussing the usage of LANGUAGE_BITS for
general_symbol_info::m_language, and indeed using the "struct { ... };"
approach avoids changing the LANGUAGE_BITS and introducing a penalty on
symbol size (which is a more numerous entity than CUs).
Still, of course it's also good to keep the dwarf2_per_cu_data struct as
small as possible, so thanks for looking into this.
> as the following field will end up
> moved to the next byte, so if LANGUAGE_BITS is 5, we'll end up with 3 bits gap.
>
> Actually, it's worse than that -- it will align m_lang to ENUM_BITFIELD(language)'s
> natural alignment, so it can introduce byte padding before and after too. :-/ :-(
>
> We can see it with "ptype /o" after applying your patch using the struct{}
> trick. Note the "3-byte padding" below:
>
> ...
> /* 13: 5 | 1 */ bool m_header_read_in : 1;
> /* XXX 2-bit hole */
> /* 14 | 1 */ struct {
> /* 14: 0 | 1 */ bool addresses_seen : 1;
> /* XXX 7-bit padding */
>
> /* total size (bytes): 1 */
> };
> /* 15: 0 | 4 */ unsigned int mark : 1;
> /* 15: 1 | 1 */ bool files_read : 1;
> /* XXX 6-bit hole */
> /* 16 | 4 */ struct {
> /* 16: 0 | 4 */ dwarf_unit_type unit_type : 8;
> /* XXX 3-byte padding */ <<<<<< 3 bytes
>
> /* total size (bytes): 4 */
> };
> /* 20 | 4 */ struct {
> /* 20: 0 | 4 */ language lang : 5;
> /* XXX 3-bit padding */
> /* XXX 3-byte padding */ <<<<<< 3 bytes
>
> /* total size (bytes): 4 */
> };
> ...
>
>
>
> So, maybe we really want something else... How about this alternative patch below?
>
> I wrote the new struct packed template using the array for storage before I
> wrote the alternative to use __attribute__((packed)), so I left the array
> version in there, as a pure standards-conforming implementation. We can just
> remove that array implementation completely, and use the ATTRIBUTE_PACKED macro
> if you don't think it's worth it. All compilers worth their salt support attribute
> packed, or something like it, I believe. The resulting gdbsupport/pack.h would
> be quite smaller.
>
> I tested this on x86-64 Ubuntu 20.04, both attribute packed no attribute
> versions, saw no regressions. Also smoke-tested with Clang (which uses the
> attribute packed implementation too, as it defines __GNUC__).
>
> I have not actually tested this with -fsanitize=thread, though. Would you
> be up for testing that, Tom, if this approach looks reasonable?
>
Yes, of course.
I've applied the patch and then started with my latest approach which
avoid locks and uses atomics:
...
diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
index f98d8b27649..bc1af0ec2d3 100644
--- a/gdb/dwarf2/read.h
+++ b/gdb/dwarf2/read.h
@@ -108,6 +108,7 @@ struct dwarf2_per_cu_data
m_header_read_in (false),
mark (false),
files_read (false),
+ m_lang (language_unknown),
scanned (false)
{
}
@@ -180,7 +181,7 @@ struct dwarf2_per_cu_data
packed<dwarf_unit_type, 1> m_unit_type = (dwarf_unit_type) 0;
/* The language of this CU. */
- packed<language, LANGUAGE_BYTES> m_lang = language_unknown;
+ std::atomic<language> m_lang __attribute__((packed));
public:
/* True if this CU has been scanned by the indexer; false if
@@ -332,11 +333,13 @@ struct dwarf2_per_cu_data
void set_lang (enum language lang)
{
- /* We'd like to be more strict here, similar to what is done in
- set_unit_type, but currently a partial unit can go from unknown to
- minimal to ada to c. */
- if (m_lang != lang)
- m_lang = lang;
+ enum language nope = language_unknown;
+ if (m_lang.compare_exchange_strong (nope, lang))
+ return;
+ nope = lang;
+ if (m_lang.compare_exchange_strong (nope, lang))
+ return;
+ gdb_assert_not_reached ();
}
/* Free any cached file names. */
...
I've tried both:
...
packed<std::atomic<language>, LANGUAGE_BYTES> m_lang
= language_unknown;
...
and:
...
std::atomic<packed<language, LANGUAGE_BYTES>> m_lang
= language_unknown;
...
and both give compilation errors:
...
src/gdb/dwarf2/read.h:184:58: error: could not convert
‘language_unknown’ from ‘language’ to ‘std::atomic<packed<language, 1> >’
std::atomic<packed<language, LANGUAGE_BYTES>> m_lang = language_unknown;
^~~~~~~~~~~~~~~~
...
and:
...
src/gdb/../gdbsupport/packed.h:84:47: error: bit-field
‘std::atomic<language> packed<std::atomic<language>, 1>::m_val’ with
non-integral type
...
Maybe one of the two should work and the pack template needs further
changes, I'm not sure.
Note btw that the attribute packed works here:
...
+ std::atomic<language> m_lang __attribute__((packed));
...
in the sense that it's got alignment 1:
...
struct atomic<language> m_lang \
__attribute__((__aligned__(1))); /* 16 4 */
...
but given that there's no LANGUAGE_BITS/BYTES, we're back to size 4 for
the m_lang field, and size 128 overall.
So for now I've settled for:
...
+ std::atomic<LANGUAGE_CONTAINER> m_lang;
...
which does get me back to size 120.
WIP patch attached.
Thanks,
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-fix.patch
Type: text/x-patch
Size: 2202 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/gdb-patches/attachments/20220707/b890e5fd/attachment.bin>
More information about the Gdb-patches
mailing list