Bug 17520 - structure offset wrong when 1/4 GB or greater
Summary: structure offset wrong when 1/4 GB or greater
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 7.8
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-29 14:36 UTC by David Taylor
Modified: 2016-06-27 18:28 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Taylor 2014-10-29 14:36:53 UTC
When a structure member's offset is 1/4 GB or greater from the start
of the structure, GDB computes the wrong value.

When computing the offset of a structure member, GDB computes first
the BITS from the start of the structure, returns that as a 32 bit
signed quantity and then divides the result by 8.

While this can show up on 32 bit systems, it is especially bad on
64 bit systems.  We have systems with over 500 GB of memory and have
run into the problem.

Here's a short program that illustrates the bug.  The actual struct
where the problem was observed has over 250 members and is over 1/2 GB
in size.

#include <stdio.h>

struct big_struct
{
  char first[0x10000000 + 16];
  long second;
} big_struct;

int
main (int argc, char *argv[])
{
  printf ("sizeof (big_struct) 0x%lx,\n&big_struct %p,\n&big_struct.second %p\noffset %lx\n",
	  sizeof (big_struct),
	  (void *)&big_struct,
	  (void *)&big_struct.second,
	  __builtin_offsetof(struct big_struct, second));
  return (0);
}

If I run it, the output is:

    sizeof (big_struct) 0x10000018,
    &big_struct 0x601040,
    &big_struct.second 0x10601050
    offset 10000010

But, in GDB 7.8:

    (gdb) p &big_struct
    $1 = (struct big_struct *) 0x601040 <big_struct>
    (gdb) p &big_struct.second
    $2 = (long *) 0xfffffffff0601050
Comment 1 Jan Kratochvil 2014-11-23 19:39:35 UTC
I Fedora carries for it the patchset:

http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-rhbz795424-bitpos-20of25.patch
http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-rhbz795424-bitpos-21of25.patch
http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-rhbz795424-bitpos-22of25.patch
http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-rhbz795424-bitpos-23of25.patch
http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-rhbz795424-bitpos-25of25-test.patch
http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-rhbz795424-bitpos-25of25.patch
http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-rhbz795424-bitpos-lazyvalue.patch

It was concluded that patching it before GDB is switched to C++ would be fragile as there cannot be done easy sanity checking if the inferior types width has not regressed.  With C++ one can wrap the offset types into sanity-checked classes with overriden operators.

But the GDB switch to C++ takes too many years.
Comment 2 Sergio Durigan Junior 2014-11-24 20:41:54 UTC
Fedora has been carrying those patches for quite some time now, and I don't remember seeing regressions because of them.  Maybe we could try to revive the patches upstream...
Comment 3 Jan Kratochvil 2014-11-24 20:49:37 UTC
I do not know how to verify they are correct.  That is there is no accidental 32bitvar=64bitvar; assignment or similar.  While that may be possible with some LLVM static analysis I have been waiting for C++ which would make it easier.  But then we are (were) all waiting on that approval of C++ switch.
Comment 4 dtaylor 2014-11-24 21:32:16 UTC
jan.kratochvil at redhat dot com <sourceware-bugzilla@sourceware.org> wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=17520
> 
> --- Comment #3 from Jan Kratochvil <jan.kratochvil at redhat dot com> ---
> I do not know how to verify they are correct.  That is there is no accidental
> 32bitvar=64bitvar; assignment or similar.  While that may be possible with some
> LLVM static analysis I have been waiting for C++ which would make it easier. 
> But then we are (were) all waiting on that approval of C++ switch.
> 
> -- 
> You are receiving this mail because:
> You reported the bug.

I don't care whether it's my patch or Red Hat's or someone else's, so
long as the bug is fixed.

The bug affects us and we want it fixed.  I would rather not have to
reapply the patch for each release.

If problems are found with the patch, generally they are easy to fix --
find the ooffending variable and change it from int to LONGEST.

I configured GDB with --enable-werror.  I started by changing everything
that I cou8ld find that was going to cause problems and then iterated
until everything built and ran with no regressions.

There a couple of places where I wasn't certain that it needed to be
converted, but I figured better safe than sorry.

If people are truly worried about it destabilizing GDB, then the right
approach is to put it in early -- to get as much mileage as possible on
it before the next release.

With regard to GDB moving to C++ -- it might happen, it has certainly
been talked about on and off for enough years, but as whether it will
ever happen -- who knows.  But, I am not going hold my breath.
Comment 5 Sourceware Commits 2016-06-25 01:08:31 UTC
The master branch has been updated by David Taylor <taylor@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6b8505468e64c2be8d0eea1f2b8db86fa3897600

commit 6b8505468e64c2be8d0eea1f2b8db86fa3897600
Author: David Taylor <dtaylor@emc.com>
Date:   Tue Apr 12 15:02:57 2016 -0400

    Support structure offsets that are 512K or larger.
    
    GDB computes structure byte offsets using a 32 bit integer.  And,
    first it computes the offset in bits and then converts to bytes.  The
    result is that any offset that if 512K bytes or larger overflows.
    This patch changes GDB to use LONGEST for such calculations.
    
    	PR gdb/17520 Structure offset wrong when 1/4 GB or greater.
    	* c-lang.h: Change all parameters, variables, and struct or union
    	members used as struct or union fie3ld offsets from int to
    	LONGEST.
    	* c-valprint.c: Likewise.
    	* cp-abi.c: Likewise.
    	* cp-abi.h: Likewise.
    	* cp-valprint.c: Likewise.
    	* d-valprint.c: Likewise.
    	* dwarf2loc.c: Likewise.
    	* eval.c: Likewise.
    	* extension-priv.h: Likewise.
    	* extension.c: Likewise.
    	* extension.h: Likewise.
    	* findvar.c: Likewise.
    	* gdbtypes.h: Likewise.
    	* gnu-v2-abi.c: Likewise.
    	* gnu-v3-abi.c: Likewise.
    	* go-valprint.c: Likewise.
    	* guile/guile-internal.h: Likewise.
    	* guile/scm-pretty-print.c: Likewise.
    	* jv-valprint.c Likewise.
    	* opencl-lang.c: Likewise.
    	* p-lang.h: Likewise.
    	* python/py-prettyprint.c: Likewise.
    	* python/python-internal.h: Likewise.
    	* spu-tdep.c: Likewise.
    	* typeprint.c: Likewise.
    	* valarith.c: Likewise.
    	* valops.c: Likewise.
    	* valprint.c: Likewise.
    	* valprint.h: Likewise.
    	* value.c: Likewise.
    	* value.h: Likewise.
    	* p-valprint.c: Likewise.
    	* c-typeprint.c (c_type_print_base): When printing offset, use
    	plongest, not %d.
    	* gdbtypes.c (recursive_dump_type): Ditto.
Comment 6 David Taylor 2016-06-27 18:28:06 UTC
Fixed by commit 6b8505468e64c2be8d0eea1f2b8db86fa3897600 on Jun 24th, 2016.