Bug 23743 - GDB index file mmapping broken, ubsan => runtime error: load of misaligned address
Summary: GDB index file mmapping broken, ubsan => runtime error: load of misaligned ad...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: unknown
: P2 normal
Target Milestone: 11.1
Assignee: Tom Tromey
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-07 16:45 UTC by Pedro Alves
Modified: 2021-04-17 19:58 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pedro Alves 2018-10-07 16:45:51 UTC
UBSAN-by-default is now tripping on a bug that was latent on x86 and other archs that can do unaligned loads.

Debugging today's (8.2.50.20181007-git) GDB under itself I'm seeing this:

(top-gdb) start
Temporary breakpoint 3 at 0x464926: main. (24 locations)
Starting program: build/gdb/gdb 
....
src/gdb/dwarf2read.c:3443:15: runtime error: load of misaligned address 0x7f8beafcc5f7 for type 'offset_type', which requires 4 byte alignment
0x7f8beafcc5f7: note: pointer points here
 00 00 00 00 08  00 00 00 18 00 00 00 98  09 00 00 98 09 00 00 1c  12 00 00 1c 52 00 00 00  00 00 00
             ^ 

The code in question is this:

  /* Version check.  */
  offset_type version = MAYBE_SWAP (*(offset_type *) addr);

'addr' above is a pointer to the start of the .gdb_index section, which contains a v8 index:

 (top-gdb) p addr
 $1 = (const gdb_byte *) 0x7ffff7fc45f7 "\b"
 (top-gdb) x/4b addr
 0x7ffff7fc45f7: 8       0       0       0

The problem is that the gdb index reading code throughout is assuming that the section contents buffer starts at a sufficiently aligned address.  That is evident from the fact that MAYBE_SWAP takes an offset_type as argument.

But, in this case the buffer's start isn't aligned, because the buffer wasn't heap-allocated -- instead the section was mmapped in place with gdb_bfd_map_section, and the .gdb_index section at an unaligned address in the file.  From objdump -h:

 31 .gdb_index    00007bd8  0000000000000000  0000000000000000  000955f7  2**0
                  CONTENTS, READONLY, DEBUGGING
                                                                ^^^^^^^^
The mmapping is here:

(top-gdb) bt
#0  0x00000000009418a0 in gdb_bfd_map_section(bfd_section*, unsigned long*) (sectp=0x5a1e7a0, size=0x59f9820)
    at ....src/gdb/gdb_bfd.c:682
#1  0x0000000000817b64 in dwarf2_read_section(objfile*, dwarf2_section_info*) (objfile=0x381f630, info=0x59f9810)
    at ....src/gdb/dwarf2read.c:2508
#2  0x000000000082dc86 in get_gdb_index_contents_from_section<dwarf2_per_objfile>(objfile*, dwarf2_per_objfile*) (obj=0x381f630, section_owner=0x59f95e0) at ....src/gdb/dwarf2read.c:6188
#3  0x00000000008b5a6f in void gdb::function_view<gdb::array_view<unsigned char const> (objfile*, dwarf2_per_objfile*)>::bind<gdb::array_view<unsigned char const>, objfile*, dwarf2_per_objfile*>(gdb::array_view<unsigned char const> (*)(objfile*, dwarf2_per_objfile*))::{lambda(gdb::fv_detail::erased_callable, objfile*, dwarf2_per_objfile*)#1}::operator()(gdb::fv_detail::erased_callable, objfile*, dwarf2_per_objfile*) const (__closure=0x0, ecall=..., args#0=0x381f630, args#1=0x59f95e0) at ....src/gdb/common/function-view.h:284
#4  0x00000000008b5aa3 in void gdb::function_view<gdb::array_view<unsigned char const> (objfile*, dwarf2_per_objfile*)>::bind<gdb::array_view<unsigned char const>, objfile*, dwarf2_per_objfile*>(gdb::array_view<unsigned char const> (*)(objfile*, dwarf2_per_objfile*))::{lambda(gdb::fv_detail::erased_callable, objfile*, dwarf2_per_objfile*)#1}::_FUN(gdb::fv_detail::erased_callable, objfile*, dwarf2_per_objfile*) () at ....src/gdb/common/function-view.h:278
#5  0x00000000008a727b in gdb::function_view<gdb::array_view<unsigned char const> (objfile*, dwarf2_per_objfile*)>::operator()(objfile*, dwarf2_per_objfile*) const (this=0x7fffffffc760, args#0=0x381f630, args#1=0x59f95e0) at ....src/gdb/common/function-view.h:247
During symbol reading, cannot get low and high bounds for subprogram DIE at 16490919.
#6  0x000000000081d752 in dwarf2_read_gdb_index(dwarf2_per_objfile*, get_gdb_index_contents_ftype, get_gdb_index_contents_dwz_ftype) (dwarf2_per_objfile=0x59f95e0, get_gdb_index_contents=..., get_gdb_index_contents_dwz=...) at ....src/gdb/dwarf2read.c:3553
#7  0x000000000082e139 in dwarf2_initialize_objfile(objfile*, dw_index_kind*) (objfile=0x381f630, index_kind=0x7fffffffc994)
    at ....src/gdb/dwarf2read.c:6262
#8  0x00000000008e9540 in elf_symfile_read(objfile*, symfile_add_flags) (objfile=0x381f630, symfile_flags=...)
    at ....src/gdb/elfread.c:1255
#9  0x0000000000dca21d in read_symbols(objfile*, symfile_add_flags) (objfile=0x381f630, add_flags=...)
    at ....src/gdb/symfile.c:794

The objfile in question is /usr/lib/debug/usr/lib64/libncursesw.so.6.0-6.0-14.20170722.fc27.x86_64.debug on Fedora 27.
Comment 1 Pedro Alves 2018-10-07 16:53:28 UTC
I don't think fixing that one MAYBE_SWAP instance would be sufficient.  I think we have to fix them all somehow.

E.g., mapped_index::symbol_table is likely going to be unaligned as well.  Used as:

  vec = (offset_type *) (index.constant_pool
			 + MAYBE_SWAP (index.symbol_table[idx].vec));

initialized as:

  const gdb_byte *symbol_table = addr + MAYBE_SWAP (metadata[i]);
  const gdb_byte *symbol_table_end = addr + MAYBE_SWAP (metadata[i + 1]);
  map->symbol_table
    = gdb::array_view<mapped_index::symbol_table_slot>
       ((mapped_index::symbol_table_slot *) symbol_table,
	(mapped_index::symbol_table_slot *) symbol_table_end);
Comment 2 Tom Tromey 2018-10-07 20:54:00 UTC
I think this code always assumed that the index section
would be aligned.  So I tend to think that the non-alignment
there is the bug.

MAYBE_SWAP could easily be changed to work around this, though.
Comment 3 Pedro Alves 2018-10-07 22:24:53 UTC
> I think this code always assumed that the index section
> would be aligned.  So I tend to think that the non-alignment
> there is the bug.

You're saying that the section should have been somehow forced to an aligned offset in the file?  Or that mmap wasn't originally considered?

I think with minimal changes throughout, MAYBE_SWAP could be changed to work with the address of its argument
instead of expecting a value.  Like:

 static inline offset_type
- byte_swap (offset_type value)
+ byte_swap (offset_type *ptr)
 {
    // work a byte at a time.
 }

- #define MAYBE_SWAP(V)  byte_swap (V)
+ #define MAYBE_SWAP(V)  byte_swap (&(V))

But, I'm not sure whether that makes expressions like:

  MAYBE_SWAP (index.symbol_table[idx].vec)

actually well defined, if symbol_table ends up being pointing to an array of unaligned elements.  I'd think not.
Comment 4 Simon Marchi 2018-10-07 22:53:32 UTC
> You're saying that the section should have been somehow forced to an aligned offset in the file?  Or that mmap wasn't originally considered?

The GDB doc says:

> The mapped index file format is designed to be directly mmapable on any architecture. In most cases, a datum is represented using a little-endian 32-bit integer value, called an offset_type. Big endian machines must byte-swap the values before using them. Exceptions to this rule are noted. The data is laid out such that alignment is always respected.

src: https://sourceware.org/gdb//onlinedocs/gdb/Index-Section-Format.html

So if all fields all placed so that alignment is respected but the linker ends up putting the section at an unaligned address (in the ELF file), then this is all for nothing.  So I believe the linker should force the alignment of that section.
Comment 5 Tom Tromey 2018-10-07 22:55:22 UTC
> You're saying that the section should have been somehow forced to an aligned
> offset in the file?  Or that mmap wasn't originally considered?

That it should be aligned in the file.
I guess it's an oversight in how gdb documents the use of objdump.
.gdb_index was designed so that the contents would be properly aligned.

I guess more work is needed to make MAYBE_SWAP work, but it seems
like it can't be too hard.
Comment 6 Tom Tromey 2018-10-07 22:56:13 UTC
Sorry, objcopy, not objdump.
Comment 7 Pedro Alves 2018-10-08 09:44:49 UTC
(In reply to Tom Tromey from comment #5)

> I guess more work is needed to make MAYBE_SWAP work, but it seems
> like it can't be too hard.

Again, my comment isn't mainly about MAYBE_SWAP, but about
expressions like these:

  index.symbol_table[idx].vec

Here, symbol_table (an array_view) references an unaligned array of unaligned offset_type objects.  That should make the operator[] and the vec element access undefined behavior, I'd think.
Comment 8 Pedro Alves 2018-10-08 09:50:03 UTC
(In reply to Simon Marchi from comment #4)
> > You're saying that the section should have been somehow forced to an aligned offset in the file?  Or that mmap wasn't originally considered?
> 
> The GDB doc says:
> 
> > The mapped index file format is designed to be directly mmapable on any architecture. In most cases, a datum is represented using a little-endian 32-bit integer value, called an offset_type. Big endian machines must byte-swap the values before using them. Exceptions to this rule are noted. The data is laid out such that alignment is always respected.
> 
> src: https://sourceware.org/gdb//onlinedocs/gdb/Index-Section-Format.html
> 
> So if all fields all placed so that alignment is respected but the linker
> ends up putting the section at an unaligned address (in the ELF file), then
> this is all for nothing.  So I believe the linker should force the alignment
> of that section.

Thanks for the reference.  Then it sounds like an option would be to check whether the mapped-in index was indeed aligned and discard the index if it wasn't.  And fix the producer side as well, of course.  Don't know off hand what produced it (gdb, gold, etc.), if it hasn't already [if you guys aren't seeing this, I guess it may have been fixed in newer Fedora].
Comment 9 Mark Wielaard 2018-10-08 10:01:26 UTC
Looking at some files (on my RHEL 7 setup) that have a .gdb_index section I see they all have an sh_addralign of 1, which means there are no alignment constraints. So either they were created with sh_addralign already set to 1 or some tool discarded that setting and the section could end up anywhere in the file without any alignment requirements.
Comment 10 Pedro Alves 2018-10-08 10:14:06 UTC

(In reply to Mark Wielaard from comment #9)
> Looking at some files (on my RHEL 7 setup) that have a .gdb_index section I
> see they all have an sh_addralign of 1, which means there are no alignment
> constraints. So either they were created with sh_addralign already set to 1
> or some tool discarded that setting and the section could end up anywhere in
> the file without any alignment requirements.

Thanks.  That's what I see too:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ objdump -h /usr/lib/debug/usr/lib64/libncursesw.so.6.0-6.0-14.20170722.fc27.x86_64.debug                        

/usr/lib/debug/usr/lib64/libncursesw.so.6.0-6.0-14.20170722.fc27.x86_64.debug:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
[....]
 31 .gdb_index    00007bd8  0000000000000000  0000000000000000  000955f7  2**0
                  CONTENTS, READONLY, DEBUGGING
                                                                          ^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Actually, I just ran the gdb.dwarf2/gdb-index.exp index testcase, and it produces gdb indexes with no enforced alignment as well:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ objdump -h testsuite/outputs/gdb.dwarf2/gdb-index/* | grep gdb_index
objdump: testsuite/outputs/gdb.dwarf2/gdb-index/gdb-index.gdb-index: File format not recognized
 28 .gdb_index    00002055  0000000000000000  0000000000000000  00001275  2**0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Comment 11 Pedro Alves 2018-10-08 10:26:58 UTC
And the reason that gdb.dwarf2/gdb-index.exp actually passes, even though the section's offset in the file is unaligned, is that we skip mmapping if the section isn't large enough.  With this patchlet disabling that optimization:

diff --git c/gdb/gdb_bfd.c w/gdb/gdb_bfd.c
index 8fedeb438d..c614af8a22 100644
--- c/gdb/gdb_bfd.c
+++ w/gdb/gdb_bfd.c
@@ -671,7 +671,7 @@ gdb_bfd_map_section (asection *sectp, bfd_size_type *size)
       /* Only try to mmap sections which are large enough: we don't want
         to waste space due to fragmentation.  */
 
-      if (bfd_get_section_size (sectp) > 4 * pagesize)
+      if (1 || bfd_get_section_size (sectp) > 4 * pagesize)
        {
          descriptor->size = bfd_get_section_size (sectp);
          descriptor->data = bfd_mmap (abfd, 0, descriptor->size, PROT_READ,

... the test starts failing as expected:

(gdb) file build/gdb/testsuite/outputs/gdb.dwarf2/gdb-index/gdb-index.with-index
Reading symbols from gdb/binutils-gdb-2/build/gdb/testsuite/outputs/gdb.dwarf2/gdb-index/gdb-index.with-index...
src/gdb/dwarf2read.c:3443:15: runtime error: load of misaligned address 0x7f7af37de275 for type 'offset_type', which requires 4 byte alignment
0x7f7af37de275: note: pointer points here
 61 69 6e 00 08 00 00  00 18 00 00 00 28 00 00  00 28 00 00 00 3c 00 00  00 3c 20 00 00 00 00 00  00
             ^ 
ERROR: Couldn't load build/gdb/testsuite/outputs/gdb.dwarf2/gdb-index/gdb-index.with-index into build/gdb/testsuite/../../gdb/gdb (eof).
Comment 12 Pedro Alves 2018-10-08 10:28:03 UTC
So I guess that "Only try to mmap sections which are large enough" check could be extended to also check if the section is sufficiently aligned.
Comment 13 Simon Marchi 2018-10-08 10:52:31 UTC
Is sh_addralign relevant for the position of the section data in the binary file, or is it only relevant for the address once loaded in memory?
Comment 14 Pedro Alves 2018-10-08 10:53:22 UTC
So I tried this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From bf04c485da7417e98d83a0f13d4a80d65d7e8190 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Mon, 8 Oct 2018 11:22:41 +0100
Subject: [PATCH] mmap

---
 gdb/gdb_bfd.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/gdb/gdb_bfd.c b/gdb/gdb_bfd.c
index 8fedeb438d..9da2d3f084 100644
--- a/gdb/gdb_bfd.c
+++ b/gdb/gdb_bfd.c
@@ -33,6 +33,7 @@
 #include "target.h"
 #include "gdb/fileio.h"
 #include "inferior.h"
+#include "complaints.h"
 
 /* An object of this type is stored in the section's user data when
    mapping a section.  */
@@ -668,10 +669,25 @@ gdb_bfd_map_section (asection *sectp, bfd_size_type *size)
       if (pagesize == 0)
 	pagesize = getpagesize ();
 
-      /* Only try to mmap sections which are large enough: we don't want
-	 to waste space due to fragmentation.  */
+      /* Only try to mmap sections which are large enough: we don't
+	 want to waste space due to fragmentation.
 
-      if (bfd_get_section_size (sectp) > 4 * pagesize)
+	 Also skip mmaping if the section isn't sufficiently aligned
+	 in the file, as the reader code assumes it is (aligned).
+	 While the index format is designed to be directly mmapable on
+	 any architecture with the data laid out such that alignment
+	 is always respected, for a long while (up till XXX)
+	 gdb/objcopy-produced index sections missed enforcing section
+	 alignment, so the section could end up at an unaligned
+	 address in the file.  */
+
+      if (sectp->filepos % 4 != 0)
+	{
+	  complaint (_("Index section '%s' in file '%s' not aligned on file"),
+		     bfd_get_section_name (abfd, sectp),
+		     bfd_get_filename (abfd));
+	}
+      else if (bfd_get_section_size (sectp) > 4 * pagesize)
 	{
 	  descriptor->size = bfd_get_section_size (sectp);
 	  descriptor->data = bfd_mmap (abfd, 0, descriptor->size, PROT_READ,
-- 
2.14.4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And now I can debug gdb, however we end up skipping mmap a lot it seems....

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(top-gdb) start
Temporary breakpoint 3 at 0x464926: main. (24 locations)
Starting program: build/gdb/gdb 
During symbol reading: Index section '.debug_abbrev' in file '/usr/lib/debug/usr/lib64/ld-2.26.so.debug' not aligned on file
During symbol reading: cannot get low and high bounds for subprogram DIE at 0x4392
During symbol reading: Multiple children of DIE 0x50e9 refer to DIE 0x8262 as their abstract origin
During symbol reading: Child DIE 0x610a and its abstract origin 0x8360 have different parents
During symbol reading: DIE 0x8570 and its abstract origin 0x7613 have different tags
During symbol reading: Child DIE 0x8570 and its abstract origin 0x7613 have different tags
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libncursesw.so.6.0-6.0-14.20170722.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libtinfo.so.6.0-6.0-14.20170722.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.debug_str' in file '/usr/lib/debug/usr/lib64/libdl-2.26.so.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libgc.so.1.0.3-7.6.0-7.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libpython2.7.so.1.0-2.7.15-2.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.debug_abbrev' in file '/usr/lib/debug/usr/lib64/libpthread-2.26.so.debug' not aligned on file
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
During symbol reading: Index section '.debug_abbrev' in file '/usr/lib/debug/usr/lib64/libutil-2.26.so.debug' not aligned on file
During symbol reading: Index section '.debug_str' in file '/usr/lib/debug/usr/lib64/libm-2.26.so.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libexpat.so.1.6.7-2.2.5-1.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/liblzma.so.5.2.3-5.2.3-4.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libbabeltrace.so.1.0.0-1.5.3-1.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libipt.so.1.6.1-1.6.1-4.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libgmp.so.10.3.2-6.1.2-6.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.debug_abbrev' in file '/usr/lib/debug/usr/lib64/libc-2.26.so.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libffi.so.6.0.2-3.1-14.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libunistring.so.2.1.0-0.9.10-1.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libltdl.so.7.3.1-2.4.6-20.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.debug_line' in file '/usr/lib/debug/usr/lib64/libcrypt-nss-2.26.so.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libatomic_ops.so.1.0.4-7.4.6-3.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/../../.dwz/elfutils-0.170-10.fc27.x86_64' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libelf-0.170.so-0.170-10.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libpopt.so.0.0.0-1.16-12.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libuuid.so.1.3.0-2.30.2-3.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libgmodule-2.0.so.0.5400.3-2.54.3-3.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libglib-2.0.so.0.5400.3-2.54.3-3.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.debug_str' in file '/usr/lib/debug/usr/lib64/librt-2.26.so.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/../../.dwz/nss-softokn-3.38.0-1.0.fc27.x86_64' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libz.so.1.2.11-1.2.11-4.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/libbz2.so.1.0.6-1.0.6-24.fc27.x86_64.debug' not aligned on file
During symbol reading: Index section '.gdb_index' in file '/usr/lib/debug/usr/lib64/../../.dwz/pcre-8.42-3.fc27.x86_64' not aligned on file

Temporary breakpoint 3, main (During symbol reading: incomplete CFI data; unspecified registers (e.g., rax) at 0x46495f
During symbol reading: Index section '.debug_loc' in file '/usr/lib/debug/usr/lib64/libc-2.26.so.debug' not aligned on file
During symbol reading: cannot get low and high bounds for subprogram DIE at 0x8e29
argc=1, argv=0x7fffffffd7e8) at /home/pedro/gdb/binutils-gdb-2/src/gdb/gdb.c:28
28        memset (&args, 0, sizeof args);
(top-gdb) 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I wonder what you guys would see with that patch.
Comment 15 Mark Wielaard 2018-10-08 11:36:26 UTC
(In reply to Simon Marchi from comment #13)
> Is sh_addralign relevant for the position of the section data in the binary
> file, or is it only relevant for the address once loaded in memory?

I seems strictly speaking the spec only requires sh_addr to be aligned to sh_addralign. But it doesn't really make sense for sh_offset to not also be aligned to sh_addralign. Precisely because otherwise you wouldn't be able to mmap the data in with correct alignment.
Comment 16 Tom Tromey 2018-10-08 15:28:19 UTC
I don't really know whether it is better to skip mmap or to
just change the code to work unaligned.  I suppose I would
lean toward the latter, but I don't really have a great reason
for that.

In the mmap case, if it's only the gdb index section that needs
this (and I think it is), then the special case should probably
be more limited.

> Again, my comment isn't mainly about MAYBE_SWAP, but about
> expressions like these:
> 
>   index.symbol_table[idx].vec
> 
> Here, symbol_table (an array_view) references an unaligned array of
> unaligned offset_type objects.  That should make the operator[] and the vec
> element access undefined behavior, I'd think.

Yes, sorry, I didn't spell it out but in comment #5 what I was
really thinking was either working solely in byte types and
complicating expressions like that, or writing a wrapper class
to hide the difficulties.
Comment 17 Pedro Alves 2018-10-09 11:59:05 UTC
(In reply to Tom Tromey from comment #16)
> I don't really know whether it is better to skip mmap or to
> just change the code to work unaligned.  I suppose I would
> lean toward the latter, but I don't really have a great reason
> for that.

I thinking I'm leaning on making the code work unligned as well.
My reason is that it seems like it has always worked that way (it's not a real problem for x86, at least with current compilers, so far), and there are seemingly many index files like that in the wild.  It if weren't for ubsan, we wouldn't think of breaking compatibility with the current indexes in the wild, so it kind of feels like a tool-imposed unnecessary break.

I'm surprised we haven't seen GDB crash with SIGBUS on most other archs that can't do unaligned accesses unlike x86, though.  In principle, fixing this issue should be fixing such crashes.

Also, still curious on why this doesn't happen for you guys.
Comment 18 Mark Wielaard 2018-10-09 12:46:59 UTC
(In reply to Pedro Alves from comment #17)
> (In reply to Tom Tromey from comment #16)
> > I don't really know whether it is better to skip mmap or to
> > just change the code to work unaligned.  I suppose I would
> > lean toward the latter, but I don't really have a great reason
> > for that.
> 
> I thinking I'm leaning on making the code work unligned as well.
> My reason is that it seems like it has always worked that way (it's not a
> real problem for x86, at least with current compilers, so far).

Be careful though on x86 vectorization might use sse or avx extensions will depend on reading correctly aligned data. So you might even see SEGVs on x86 in case the compiler tries to be smart and relies on the types to be correctly aligned.
Comment 19 Tom Tromey 2021-03-05 20:09:52 UTC
I have a patch for this.
Comment 20 Sourceware Commits 2021-04-17 19:56:49 UTC
The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=42c2c69462fd83db2e0532ee57c44091bc1032f9

commit 42c2c69462fd83db2e0532ee57c44091bc1032f9
Author: Tom Tromey <tom@tromey.com>
Date:   Sat Apr 17 13:56:36 2021 -0600

    Handle unaligned mapping of .gdb_index
    
    The .gdb_index was designed such that all data would be aligned.
    Unfortunately, we neglected to require this alignment in the objcopy
    instructions in the manual.  As a result, in many cases, a .gdb_index
    in the wild will not be properly aligned by mmap.  This yields
    undefined behavior, which is PR gdb/23743.
    
    This patch fixes the bug by always assuming that the mapping is
    unaligned, and using extract_unsigned_integer when needed.  A new
    helper class is introduced to make this less painful.
    
    gdb/ChangeLog
    2021-04-17  Tom Tromey  <tom@tromey.com>
    
            PR gdb/23743:
            * dwarf2/read.c (class offset_view): New.
            (struct symbol_table_slot): Remove.
            (struct mapped_index) <symbol_table, constant_pool>: Change type.
            <symbol_name_index, symbol_vec_index>: New methods.
            <symbol_name_slot_invalid, symbol_name_at, symbol_name_count>:
            Rewrite.
            (read_gdb_index_from_buffer): Update.
            (struct dw2_symtab_iterator) <vec>: Change type.
            (dw2_symtab_iter_init_common, dw2_symtab_iter_init)
            (dw2_symtab_iter_next, dw2_expand_marked_cus): Update.
            * dwarf2/index-write.c (class data_buf) <append_data>: Remove.
            <append_array, append_offset>: New methods.
            (write_hash_table, add_address_entry, write_gdbindex_1)
            (write_debug_names): Update.
            * dwarf2/index-common.h (byte_swap, MAYBE_SWAP): Remove.
Comment 21 Tom Tromey 2021-04-17 19:58:53 UTC
Fixed.