This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Re: RFC: partial symbol table address range generalization
Daniel Jacobowitz <drow@mvista.com> writes:
> > > This particular problem should be avoidable anyway. I would appreciate
> > > it if you would look at:
> > > <http://sources.redhat.com/ml/gdb/2001-09/msg00068.html>
> >
> > Well, okay. But I'd much prefer to see the problem fixed by making
> > the symbol table contents more accurate than by adding another
> > heuristic for recognizing insane data. When checks like that get into
> > the code, they never go away, because nobody's ever really sure
> > whether the circumstances it was meant to cope with happen any more.
>
> Well, I wouldn't call it a heuristic in this case. If we don't have
> debugging info, isn't it a little bit insane to try to find a line
> number?
Yes, certainly. But the reasoning I see going on is this: we can't
trust our address range data, and our source line search screws up if
the address range data isn't accurate, so we'll prop up the process by
pulling a name search into the process, too. The more checks we add,
the more likely we are to detect that something's screwey.
This works, but I'd really rather just make the address range data
accurate in the first place. The part that strikes me as a
`heuristic' is the assumption that the address ranges are not quite
trustworthy, and that therefore we should cross-check them against the
function table before we proceed.
If you think it's going to be too hard to get reliable address ranges
for symtabs, then it'd certainly be better to put Peter's change in;
at least something will get fixed. But if it's practical, I think
it's better policy to work hard at filling your data structures with
reliable data than to sprinkle your query code with sanity checks.
> For performance reasons, it might be better to steal the iterator
> concept and have an opaque cookie separate from the start/end
> addresses. Otherwise, addrset_next_range is not going to be constant
> time, and that could get annoying in an objfile with a large number of
> sections... and C++ is notorious for absurd numbers of sections.
It is (usually) constant time. :) The representation maintains a
`current position' in the sorted linked list of ranges, and starts its
searches from that point. So in a series of operations where each
address is near the last one (like a sequential scan), each operation
takes place in constant time.
This is somewhat like having a single iterator object hidden inside
the addrset object; that is, at any given time, there's only one range
of addresses which can be operated on quickly. If instead we had an
explicit iterator type, one could do multiple traversals of different
areas efficiently. However, mutations can delete nodes from the list,
and change their values; it seems to me that keeping iterators valid
across mutation operations would add complexity.
Anyway, it's an opaque type. If performance is a problem, or if we
want to add iterators, we can revamp the whole thing without breaking
its users.
Here's the implementation:
/* addrset.c --- implementation of the address set datatype
Copyright 2001 Free Software Foundation, Inc.
This file is part of GDB.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
#include "defs.h"
#include "obstack.h"
#include "gdb_assert.h"
#include "addrset.h"
struct addrset {
/* Two singly-linked lists of nodes; each node represents a range of
addresses in the addrset. None of these nodes' address ranges
overlap or abut each other. `before' is sorted in order of
decreasing addresses, `after' in order of increasing addresses.
All nodes in `before' have lower addresses than all the nodes in
`after'.
Generally, whenever we operate on the list, we try split it
between `before' and `after' so that the division comes directly
after the last node we operated on. That way, sequential access
happens in constant time. Since we split it *after* the last
node we operate on, we're slightly biased towards traversal
towards increasing address, but it's not too bad if we want to go
the other way. */
struct node *before, *after;
/* If this is non-zero, allocate new nodes from this obstack.
Otherwise, allocate nodes using xmalloc. */
struct obstack *obstack;
/* We can't free nodes allocated from an obstack, so we might as
well keep them in a free list, and reuse them. This list can't
be global, as different obstacks have different lifetimes.
Ideally, it would be per-obstack, so a node could be freed from
one addrset and then re-used by another addrset in the same
obstack. But we don't have any convenient way to store
per-obstack info. So the free list is per-addrset.
For addrsets allocated using xmalloc, we simply xfree nodes, and
hope malloc manages small blocks well; this field is always zero. */
struct node *free;
};
/* One contiguous range of addresses in an addrset. */
struct node {
struct node *next;
/* The start and end addresses, inclusive, of the address range.
Obviously, start <= end. If start == end, that's a one-byte
range. */
CORE_ADDR start, end;
};
struct addrset *
new_addrset (struct obstack *obstack)
{
struct addrset *new;
if (obstack)
new = obstack_alloc (obstack, sizeof (*new));
else
new = xmalloc (sizeof (*new));
memset (new, 0, sizeof (*new));
new->obstack = obstack;
return new;
}
static void
xfree_node_list (struct node *n)
{
while (n)
{
struct node *next = n->next;
xfree (n);
n = next;
}
}
void
addrset_free (struct addrset *addrset)
{
/* You can't free an addrset allocated in an obstack. */
gdb_assert (! addrset->obstack);
/* If it wasn't allocated on an obstack, it had better not have a
free list. */
gdb_assert (! addrset->free);
xfree_node_list (addrset->before);
xfree_node_list (addrset->after);
}
static struct node *
new_node (struct addrset *addrset,
CORE_ADDR start, CORE_ADDR end)
{
struct node *new;
if (addrset->obstack)
{
/* If we have any nodes on the free list, use them. */
if (addrset->free)
{
new = addrset->free;
addrset->free = new->next;
}
else
new = obstack_alloc (addrset->obstack, sizeof (*new));
}
else
new = xmalloc (sizeof (*new));
new->next = 0;
new->start = start;
new->end = end;
return new;
}
static void
free_node (struct addrset *addrset, struct node *node)
{
if (addrset->obstack)
{
node->next = addrset->free;
addrset->free = node;
}
else
xfree (node);
}
/* Return true if A is less than B, and there is at least one
address between them. */
static int
less_and_separate (CORE_ADDR a, CORE_ADDR b)
{
/* We need both tests; think about what happens, for example, when
B is zero and A is (CORE_ADDR) -1. */
return (a < b && b - a >= 2);
}
/* Shift nodes from `before' to `after', or in the other direction,
until the split falls at the right place. By `the right place', we
mean:
- any nodes on `after' are after, and do not abut, the range from
START to END, and
- any nodes on `before' are either completely before, abut, or overlap
the range from START to END.
Once this is done, any nodes on `after' are irrelevant to our
operation, and any nodes that abut or overlap our range are at the
head of `before'. */
static void
resplit (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
struct node *node;
/* We know the `before' and `after' lists are each sorted in the
proper direction, and contain non-overlapping, non-adjacent
ranges.
The first step is to move all ranges from `after' to `before'
that are before, overlap, or abut the range we're operating on.
Be careful of nodes abutting either end of the address space. */
while ((node = addrset->after) != 0
&& ! less_and_separate (end, node->start))
{
addrset->after = node->next;
node->next = addrset->before;
addrset->before = node;
}
/* Now, any ranges in `after' are after and do not abut the range
we're operating on. This means that all those nodes are
irrelevant to this operation, and we can ignore them.
Thus, any nodes we care about are somewhere in `before'. Bring
them to the head of that list by shifting any nodes from `before'
to `after' that are completely after the range we're operating
on.
The astute reader will note that the condition for moving a node
in this loop is the complement of that for the previous loop.
That is, the condition for moving a node in one direction is the
opposite of moving it in the other direction. This means that
only one of these two loops' bodies will ever be executed in a
given call. */
while ((node = addrset->before) != 0
&& less_and_separate (end, node->start))
{
addrset->before = node->next;
node->next = addrset->after;
addrset->after = node;
}
}
void
addrset_add (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
struct node *node, *save;
gdb_assert (start <= end);
resplit (addrset, start, end);
/* Now we know that any ranges in `after' are irrelevant to this
operation, and that zero or more nodes at the head of `before'
abut or overlap the range we're adding. Remove any such nodes
from `before', and absorb their ranges into ours. */
save = 0;
while ((node = addrset->before) != 0
&& ! less_and_separate (node->end, start))
{
if (node->start < start)
start = node->start;
if (node->end > end)
end = node->end;
addrset->before = node->next;
/* We want to remove this node from the list, but we know we're
going to insert a new one in a bit, so hold onto the first
node we delete, just to save a call to xfree and xmalloc. */
if (save)
free_node (addrset, node);
else
save = node;
}
/* Create a node for the new range. */
if (save)
{
save->start = start;
save->end = end;
}
else
save = new_node (addrset, start, end);
/* Stick it on the head of the `before' list. */
save->next = addrset->before;
addrset->before = save;
}
void
addrset_remove (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
struct node *node;
gdb_assert (start <= end);
/* This could end up moving a bunch of nodes to `before' that we're
just going to delete anyway. But our running time is linear in
the number of ranges we're going to delete anyway, and this lets
us deal with the `before' list only; otherwise, we'd have to
write out the code twice, once for each list. */
resplit (addrset, start, end);
/* Now we know that any ranges in `after' are irrelevant to this
operation, and that zero or more nodes at the head of `before'
abut or overlap the range we're adding.
If the node at the head of the `before' list extends after our
range, either truncate it and shift it to `after', or split it in
two. */
if ((node = addrset->before) != 0
&& end < node->end)
{
/* If it also extends before the range we're deleting, we need
to split it into two nodes: one for the part before and one
for the part after. And we know we don't need to do anything
else to the `before' list. */
if (node->start < start)
{
struct node *right_part = new_node (addrset, end + 1, node->end);
right_part->next = addrset->after;
addrset->after = right_part;
node->end = start - 1;
return;
}
/* Just shift the node onto `after', and adjust its size. */
addrset->before = node->next;
node->next = addrset->after;
addrset->after = node;
node->start = end + 1;
/* There may be more nodes on `before' that we need to adjust. */
}
/* Now we know that the head of `before' doesn't extend beyond the
end of the range we're deleting. Nothing will need to be added
to `after'.
Remove any nodes completely contained in the region we're
deleting. */
while ((node = addrset->before) != 0
&& start <= node->start)
{
addrset->before = node->next;
free_node (addrset, node);
}
/* The head of `before' (if any) isn't completely contained in the
range we're deleting, but it may partially overlap it at the low
end. If that's the case, adjust its endpoint. */
if ((node = addrset->before) != 0
&& start <= node->end)
node->end = start - 1;
}
/* Shift *ADDR by OFFSET, and check that the direction of change from
OLD to NEW is consistent with *DIR. If it isn't, raise an internal
error. OFFSET must be non-zero. See offset_node_list for the
meaning of *DIR. */
static void
shift_addr (CORE_ADDR *addr, CORE_ADDR offset, int *dir)
{
CORE_ADDR old = *addr;
CORE_ADDR new = (*addr += offset);
if (new < old)
{
if (! *dir)
*dir = -1;
else
gdb_assert (*dir == -1);
}
else
{
if (! *dir)
*dir = 1;
else
gdb_assert (*dir == 1);
}
}
/* Offset addresses in NODE and its successors by OFFSET.
OFFSET must be non-zero.
*DIR is the direction of the shift, used for error checking.
- If *DIR is zero, then we don't know which direction things will be
shifted yet.
- If *DIR is -1 or +1, then OFFSET shifted other addresses in other nodes
downwards or upwards, and we should raise an internal_error if any
of our addresses go in the other direction. */
static void
offset_node_list (struct node *node, CORE_ADDR offset, int *dir)
{
while (node)
{
shift_addr (&node->start, offset, dir);
shift_addr (&node->end, offset, dir);
node = node->next;
}
}
void
addrset_offset (struct addrset *addrset, CORE_ADDR offset)
{
int dir = 0;
if (offset == 0)
return;
offset_node_list (addrset->before, offset, &dir);
offset_node_list (addrset->after, offset, &dir);
}
int
addrset_in (struct addrset *addrset, CORE_ADDR addr)
{
struct node *node;
resplit (addrset, addr, addr);
/* Now any node containing addr would have to be on `before'. */
return ((node = addrset->before) != 0
&& node->start <= addr && addr <= node->end);
}
int
addrset_span (struct addrset *addrset,
CORE_ADDR addr,
CORE_ADDR *start,
CORE_ADDR *end)
{
struct node *node;
resplit (addrset, addr, addr);
/* Now any node containing addr would have to be on `before'. */
if ((node = addrset->before) != 0
&& node->start <= addr && addr <= node->end)
{
*start = node->start;
*end = node->end;
return 1;
}
else
return 0;
}
void
addrset_first_range (struct addrset *addrset,
CORE_ADDR *start,
CORE_ADDR *end)
{
struct node *node;
resplit (addrset, 0, 0);
/* If there's anything on `before', that's our first range.
Otherwise, if there's anything on after, that's it. */
if ((node = addrset->before) != 0
|| (node = addrset->after) != 0)
{
*start = node->start;
*end = node->end;
}
else
{
/* ADDRSET is empty. */
*start = 1;
*end = 0;
}
}
void
addrset_next_range (struct addrset *addrset,
CORE_ADDR after,
CORE_ADDR *start,
CORE_ADDR *end)
{
struct node *node;
resplit (addrset, after, after);
/* The first node on `before' might extend beyond our range. */
if ((node = addrset->before) != 0
&& after < node->end)
{
*start = after + 1;
*end = node->end;
}
/* Otherwise, the first node on `after' is the one we want. */
else if ((node = addrset->after) != 0)
{
*start = node->start;
*end = node->end;
}
/* Otherwise, there are no more addresses in ADDRSET after AFTER. */
else
{
*start = 1;
*end = 0;
}
}
void
addrset_print (struct addrset *addrset, struct ui_file *outfile)
{
CORE_ADDR start, end;
int first = 1;
for (addrset_first_range (addrset, &start, &end);
start <= end;
addrset_next_range (addrset, end, &start, &end))
{
if (! first)
fprintf_filtered (outfile, ", ");
print_address_numeric (start, 1, outfile);
fprintf_filtered (outfile, "-");
print_address_numeric (end, 1, outfile);
first = 0;
}
}