This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Re: RFC: partial symbol table address range generalization

To: Daniel Jacobowitz <drow at mvista dot com>
Subject: Re: RFC: partial symbol table address range generalization
From: Jim Blandy <jimb at zwingli dot cygnus dot com>
Date: 24 Oct 2001 19:09:25 -0500
Cc: gdb-patches at sources dot redhat dot com
References: <20011023233450.09F855E9D8@zwingli.cygnus.com><20011023195430.A15242@nevyn.them.org><npsnc965cl.fsf@zwingli.cygnus.com><20011024004201.A22007@nevyn.them.org>

Daniel Jacobowitz <drow@mvista.com> writes:
> > > This particular problem should be avoidable anyway.  I would appreciate
> > > it if you would look at:
> > >  <http://sources.redhat.com/ml/gdb/2001-09/msg00068.html>
> > 
> > Well, okay.  But I'd much prefer to see the problem fixed by making
> > the symbol table contents more accurate than by adding another
> > heuristic for recognizing insane data.  When checks like that get into
> > the code, they never go away, because nobody's ever really sure
> > whether the circumstances it was meant to cope with happen any more.
> 
> Well, I wouldn't call it a heuristic in this case.  If we don't have
> debugging info, isn't it a little bit insane to try to find a line
> number?

Yes, certainly.  But the reasoning I see going on is this: we can't
trust our address range data, and our source line search screws up if
the address range data isn't accurate, so we'll prop up the process by
pulling a name search into the process, too.  The more checks we add,
the more likely we are to detect that something's screwey.

This works, but I'd really rather just make the address range data
accurate in the first place.  The part that strikes me as a
`heuristic' is the assumption that the address ranges are not quite
trustworthy, and that therefore we should cross-check them against the
function table before we proceed.

If you think it's going to be too hard to get reliable address ranges
for symtabs, then it'd certainly be better to put Peter's change in;
at least something will get fixed.  But if it's practical, I think
it's better policy to work hard at filling your data structures with
reliable data than to sprinkle your query code with sanity checks.

> For performance reasons, it might be better to steal the iterator
> concept and have an opaque cookie separate from the start/end
> addresses.  Otherwise, addrset_next_range is not going to be constant
> time, and that could get annoying in an objfile with a large number of
> sections... and C++ is notorious for absurd numbers of sections.

It is (usually) constant time.  :)  The representation maintains a
`current position' in the sorted linked list of ranges, and starts its
searches from that point.  So in a series of operations where each
address is near the last one (like a sequential scan), each operation
takes place in constant time.

This is somewhat like having a single iterator object hidden inside
the addrset object; that is, at any given time, there's only one range
of addresses which can be operated on quickly.  If instead we had an
explicit iterator type, one could do multiple traversals of different
areas efficiently.  However, mutations can delete nodes from the list,
and change their values; it seems to me that keeping iterators valid
across mutation operations would add complexity.

Anyway, it's an opaque type.  If performance is a problem, or if we
want to add iterators, we can revamp the whole thing without breaking
its users.

Here's the implementation:


/* addrset.c --- implementation of the address set datatype  
   Copyright 2001 Free Software Foundation, Inc.

   This file is part of GDB.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include "defs.h"
#include "obstack.h"
#include "gdb_assert.h"
#include "addrset.h"

struct addrset {

  /* Two singly-linked lists of nodes; each node represents a range of
     addresses in the addrset.  None of these nodes' address ranges
     overlap or abut each other.  `before' is sorted in order of
     decreasing addresses, `after' in order of increasing addresses.
     All nodes in `before' have lower addresses than all the nodes in
     `after'.

     Generally, whenever we operate on the list, we try split it
     between `before' and `after' so that the division comes directly
     after the last node we operated on.  That way, sequential access
     happens in constant time.  Since we split it *after* the last
     node we operate on, we're slightly biased towards traversal
     towards increasing address, but it's not too bad if we want to go
     the other way.  */
  struct node *before, *after;

  /* If this is non-zero, allocate new nodes from this obstack.
     Otherwise, allocate nodes using xmalloc.  */
  struct obstack *obstack;

  /* We can't free nodes allocated from an obstack, so we might as
     well keep them in a free list, and reuse them.  This list can't
     be global, as different obstacks have different lifetimes.
     Ideally, it would be per-obstack, so a node could be freed from
     one addrset and then re-used by another addrset in the same
     obstack.  But we don't have any convenient way to store
     per-obstack info.  So the free list is per-addrset.

     For addrsets allocated using xmalloc, we simply xfree nodes, and
     hope malloc manages small blocks well; this field is always zero.  */
  struct node *free;

};

/* One contiguous range of addresses in an addrset.  */
struct node {
  struct node *next;

  /* The start and end addresses, inclusive, of the address range.
     Obviously, start <= end.  If start == end, that's a one-byte
     range.  */
  CORE_ADDR start, end;       
};


struct addrset *
new_addrset (struct obstack *obstack)
{
  struct addrset *new;

  if (obstack)
    new = obstack_alloc (obstack, sizeof (*new));
  else
    new = xmalloc (sizeof (*new));

  memset (new, 0, sizeof (*new));

  new->obstack = obstack;

  return new;
}


static void
xfree_node_list (struct node *n)
{
  while (n)
    {
      struct node *next = n->next;

      xfree (n);
      n = next;
    }
}


void
addrset_free (struct addrset *addrset)
{
  /* You can't free an addrset allocated in an obstack.  */
  gdb_assert (! addrset->obstack);

  /* If it wasn't allocated on an obstack, it had better not have a
     free list.  */
  gdb_assert (! addrset->free);

  xfree_node_list (addrset->before);
  xfree_node_list (addrset->after);
}


static struct node *
new_node (struct addrset *addrset,
          CORE_ADDR start, CORE_ADDR end)
{
  struct node *new;

  if (addrset->obstack)
    {
      /* If we have any nodes on the free list, use them.  */
      if (addrset->free)
        {
          new = addrset->free;
          addrset->free = new->next;
        }
      else
        new = obstack_alloc (addrset->obstack, sizeof (*new));
    }
  else
    new = xmalloc (sizeof (*new));

  new->next = 0;
  new->start = start;
  new->end = end;

  return new;
}


static void
free_node (struct addrset *addrset, struct node *node)
{
  if (addrset->obstack)
    {
      node->next = addrset->free;
      addrset->free = node;
    }
  else
    xfree (node);
}


/* Return true if A is less than B, and there is at least one
   address between them.  */
static int
less_and_separate (CORE_ADDR a, CORE_ADDR b)
{
  /* We need both tests; think about what happens, for example, when
     B is zero and A is (CORE_ADDR) -1.  */ 
  return (a < b && b - a >= 2);
}


/* Shift nodes from `before' to `after', or in the other direction,
   until the split falls at the right place.  By `the right place', we
   mean:
   - any nodes on `after' are after, and do not abut, the range from
     START to END, and
   - any nodes on `before' are either completely before, abut, or overlap
     the range from START to END.
   Once this is done, any nodes on `after' are irrelevant to our
   operation, and any nodes that abut or overlap our range are at the
   head of `before'.  */
static void
resplit (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
  struct node *node;

  /* We know the `before' and `after' lists are each sorted in the
     proper direction, and contain non-overlapping, non-adjacent
     ranges.

     The first step is to move all ranges from `after' to `before'
     that are before, overlap, or abut the range we're operating on.
     Be careful of nodes abutting either end of the address space.  */
  while ((node = addrset->after) != 0
         && ! less_and_separate (end, node->start))
    {
      addrset->after = node->next;
      node->next = addrset->before;
      addrset->before = node;
    }

  /* Now, any ranges in `after' are after and do not abut the range
     we're operating on.  This means that all those nodes are
     irrelevant to this operation, and we can ignore them.

     Thus, any nodes we care about are somewhere in `before'.  Bring
     them to the head of that list by shifting any nodes from `before'
     to `after' that are completely after the range we're operating
     on.

     The astute reader will note that the condition for moving a node
     in this loop is the complement of that for the previous loop.
     That is, the condition for moving a node in one direction is the
     opposite of moving it in the other direction.  This means that
     only one of these two loops' bodies will ever be executed in a
     given call.  */
  while ((node = addrset->before) != 0
         && less_and_separate (end, node->start))
    {
      addrset->before = node->next;
      node->next = addrset->after;
      addrset->after = node;
    }
}


void
addrset_add (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
  struct node *node, *save;

  gdb_assert (start <= end);

  resplit (addrset, start, end);

  /* Now we know that any ranges in `after' are irrelevant to this
     operation, and that zero or more nodes at the head of `before'
     abut or overlap the range we're adding.  Remove any such nodes
     from `before', and absorb their ranges into ours.  */
  save = 0;
  while ((node = addrset->before) != 0
         && ! less_and_separate (node->end, start))
    {
      if (node->start < start)
        start = node->start;
      if (node->end > end)
        end = node->end;
      addrset->before = node->next;

      /* We want to remove this node from the list, but we know we're
         going to insert a new one in a bit, so hold onto the first
         node we delete, just to save a call to xfree and xmalloc.  */
      if (save)
        free_node (addrset, node);
      else
        save = node;
    }

  /* Create a node for the new range.  */
  if (save)
    {
      save->start = start;
      save->end = end;
    }
  else
    save = new_node (addrset, start, end);

  /* Stick it on the head of the `before' list.  */
  save->next = addrset->before;
  addrset->before = save;
}


void
addrset_remove (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
  struct node *node;

  gdb_assert (start <= end);

  /* This could end up moving a bunch of nodes to `before' that we're
     just going to delete anyway.  But our running time is linear in
     the number of ranges we're going to delete anyway, and this lets
     us deal with the `before' list only; otherwise, we'd have to
     write out the code twice, once for each list.  */
  resplit (addrset, start, end);

  /* Now we know that any ranges in `after' are irrelevant to this
     operation, and that zero or more nodes at the head of `before'
     abut or overlap the range we're adding.

     If the node at the head of the `before' list extends after our
     range, either truncate it and shift it to `after', or split it in
     two.  */
  if ((node = addrset->before) != 0
      && end < node->end)
    {
      /* If it also extends before the range we're deleting, we need
         to split it into two nodes: one for the part before and one
         for the part after.  And we know we don't need to do anything
         else to the `before' list.  */
      if (node->start < start)
        {
          struct node *right_part = new_node (addrset, end + 1, node->end);
          right_part->next = addrset->after;
          addrset->after = right_part;
          node->end = start - 1;
          return;
        }

      /* Just shift the node onto `after', and adjust its size.  */
      addrset->before = node->next;
      node->next = addrset->after;
      addrset->after = node;
      node->start = end + 1;

      /* There may be more nodes on `before' that we need to adjust.  */
    }

  /* Now we know that the head of `before' doesn't extend beyond the
     end of the range we're deleting.  Nothing will need to be added
     to `after'.

     Remove any nodes completely contained in the region we're
     deleting.  */
  while ((node = addrset->before) != 0
         && start <= node->start)
    {
      addrset->before = node->next;
      free_node (addrset, node);
    }

  /* The head of `before' (if any) isn't completely contained in the
     range we're deleting, but it may partially overlap it at the low
     end.  If that's the case, adjust its endpoint.  */
  if ((node = addrset->before) != 0
      && start <= node->end)
    node->end = start - 1;
}


/* Shift *ADDR by OFFSET, and check that the direction of change from
   OLD to NEW is consistent with *DIR.  If it isn't, raise an internal
   error.  OFFSET must be non-zero.  See offset_node_list for the
   meaning of *DIR.  */
static void
shift_addr (CORE_ADDR *addr, CORE_ADDR offset, int *dir)
{
  CORE_ADDR old = *addr;
  CORE_ADDR new = (*addr += offset);

  if (new < old)
    {
      if (! *dir)
        *dir = -1;
      else
        gdb_assert (*dir == -1);
    }
  else
    {
      if (! *dir)
        *dir = 1;
      else
        gdb_assert (*dir == 1);
    }
}


/* Offset addresses in NODE and its successors by OFFSET.
   OFFSET must be non-zero.
   *DIR is the direction of the shift, used for error checking.
   - If *DIR is zero, then we don't know which direction things will be
     shifted yet.
   - If *DIR is -1 or +1, then OFFSET shifted other addresses in other nodes
     downwards or upwards, and we should raise an internal_error if any
     of our addresses go in the other direction.  */
static void
offset_node_list (struct node *node, CORE_ADDR offset, int *dir)
{
  while (node)
    {
      shift_addr (&node->start, offset, dir);
      shift_addr (&node->end, offset, dir);

      node = node->next;
    }
}


void
addrset_offset (struct addrset *addrset, CORE_ADDR offset)
{
  int dir = 0;

  if (offset == 0)
    return;

  offset_node_list (addrset->before, offset, &dir);
  offset_node_list (addrset->after, offset, &dir);
}


int
addrset_in (struct addrset *addrset, CORE_ADDR addr)
{
  struct node *node;

  resplit (addrset, addr, addr);

  /* Now any node containing addr would have to be on `before'.  */
  return ((node = addrset->before) != 0
          && node->start <= addr && addr <= node->end);
}


int
addrset_span (struct addrset *addrset,
              CORE_ADDR addr,
              CORE_ADDR *start,
              CORE_ADDR *end)
{
  struct node *node;

  resplit (addrset, addr, addr);

  /* Now any node containing addr would have to be on `before'.  */
  if ((node = addrset->before) != 0
      && node->start <= addr && addr <= node->end)
    {
      *start = node->start;
      *end = node->end;
      return 1;
    }
  else
    return 0;
}


void
addrset_first_range (struct addrset *addrset,
                     CORE_ADDR *start,
                     CORE_ADDR *end)
{
  struct node *node;
  
  resplit (addrset, 0, 0);
  
  /* If there's anything on `before', that's our first range.
     Otherwise, if there's anything on after, that's it.  */
  if ((node = addrset->before) != 0
      || (node = addrset->after) != 0)
    {
      *start = node->start;
      *end = node->end;
    }
  else
    {
      /* ADDRSET is empty.  */
      *start = 1;
      *end = 0;
    }
}


void
addrset_next_range (struct addrset *addrset,
                    CORE_ADDR after,
                    CORE_ADDR *start,
                    CORE_ADDR *end)
{
  struct node *node;

  resplit (addrset, after, after);

  /* The first node on `before' might extend beyond our range.  */
  if ((node = addrset->before) != 0
      && after < node->end)
    {
      *start = after + 1;
      *end = node->end;
    }

  /* Otherwise, the first node on `after' is the one we want.  */
  else if ((node = addrset->after) != 0)
    {
      *start = node->start;
      *end = node->end;
    }

  /* Otherwise, there are no more addresses in ADDRSET after AFTER.  */
  else
    {
      *start = 1;
      *end = 0;
    }
}


void
addrset_print (struct addrset *addrset, struct ui_file *outfile)
{
  CORE_ADDR start, end;
  int first = 1;

  for (addrset_first_range (addrset, &start, &end);
       start <= end;
       addrset_next_range (addrset, end, &start, &end))
    {
      if (! first)
        fprintf_filtered (outfile, ", ");
      print_address_numeric (start, 1, outfile);
      fprintf_filtered (outfile, "-");
      print_address_numeric (end, 1, outfile);
      first = 0;
    }
}
Follow-Ups:
- Re: RFC: partial symbol table address range generalization
  - From: Daniel Jacobowitz
References:
- RFC: partial symbol table address range generalization
  - From: Jim Blandy
- Re: RFC: partial symbol table address range generalization
  - From: Daniel Jacobowitz
- Re: RFC: partial symbol table address range generalization
  - From: Jim Blandy
- Re: RFC: partial symbol table address range generalization
  - From: Daniel Jacobowitz
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]