This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
iconv's find_derivation
- To: libc-alpha at sources dot redhat dot com
- Subject: iconv's find_derivation
- From: Bruno Haible <haible at ilog dot fr>
- Date: Mon, 4 Sep 2000 14:54:43 +0200 (CEST)
The find_derivation function which finds intermediate encodings in order to
convert from one encoding to another one is buggy: it does not always return
the best path (i.e. the one with the least cost sum).
It would be correct if there was no distinction between `toset' and
`toset_expand'; in this case all solution paths would end in `toset' and
only the best path to `toset' would have to be kept in the `solution'
variable.
But the current code can, through the
solution = step;
assignment, produce a non-optimal solution if:
1. a path P1 to `toset' is found,
2. a path P2 to `toset_expand' is found,
then the `solution' list contains P1 followed by P2,
3. a better path to an intermediate node in P2 is found, such
that P2 becomes better than P1, then P1 is removed off the list,
4. a better path to an intermediate node N in P1 is found.
P1 has already been removed off the `solution' list and won't be re-added
(because the algorithm searches all paths outgoing from N only once).
Two other things that don't make sense about the current code are:
- New solutions are always added to the `solution' list, thus making the
list grow more than needed. The same trick as for the `first' list
could be applied: if the same charset has already been added to the
`solution' list, its derivation_step can be reused.
- It always adds new solutions at the end of the list, but the `solution'
list can be considered an ordered one - thus adding to the front is faster.
Here is a patch which
- fixes the algorithm,
- adds comments about how the algorithm works and why the `solution' list
contains at most 2 elements,
- at the end, chooses the cheaper among the 2 elements.
Bruno
2000-09-03 Bruno Haible <haible@clisp.cons.org>
* iconv/gconv_db.c (find_derivation): Always use the least-cost
solution.
*** glibc-20000831/iconv/gconv_db.c.bak Sat Sep 2 19:35:55 2000
--- glibc-20000831/iconv/gconv_db.c Sun Sep 3 02:11:33 2000
***************
*** 333,341 ****
int best_cost_lo = INT_MAX;
int result;
! /* There is a small chance that this derivation is meanwhile found. This
! can happen if in `find_derivation' we look for this derivation, didn't
! find it but at the same time another thread looked for this derivation. */
result = derivation_lookup (fromset_expand ?: fromset, toset_expand ?: toset,
handle, nsteps);
if (result == __GCONV_OK)
--- 358,365 ----
int best_cost_lo = INT_MAX;
int result;
! /* Look whether an earlier call to `find_derivation' has already
! computed a possible derivation. If so, return it immediately. */
result = derivation_lookup (fromset_expand ?: fromset, toset_expand ?: toset,
handle, nsteps);
if (result == __GCONV_OK)
***************
*** 346,354 ****
return result;
}
! /* For now we use a simple algorithm with quadratic runtime behaviour.
! The task is to match the `toset' with any of the available rules,
! starting from FROMSET. */
if (fromset_expand != NULL)
{
first = NEW_STEP (fromset_expand, 0, 0, NULL, NULL);
--- 370,401 ----
return result;
}
! /* The task is to find a sequence of transformations, backed by the
! existing modules - whether builtin or dynamically loadable -,
! starting at `fromset' (or `fromset_expand') and ending at `toset'
! (or `toset_expand'), and with minimal cost.
!
! For computer scientists, this is a shortest path search in the
! graph where the nodes are all possible charsets and the edges are
! the transformations listed in __gconv_modules_db.
!
! For now we use a simple algorithm with quadratic runtime behaviour.
! A breadth-first search, starting at `fromset' and `fromset_expand'.
! The list starting at `first' contains all nodes that have been
! visited up to now, in the order in which they have been visited --
! excluding the goal nodes `toset' and `toset_expand' which get
! managed in the list starting at `solution'.
! `current' walks through the list starting at `first' and looks
! which nodes are reachable from the current node, adding them to
! the end of the list [`first' or `solution' respectively] (if
! they are visited the first time) or updating them in place (if
! they have have already been visited).
! In each node of either list, cost_lo and cost_hi contain the
! minimum cost over any paths found up to now, starting at `fromset'
! or `fromset_expand', ending at that node. best_cost_lo and
! best_cost_hi represent the minimum over the elements of the
! `solution' list. */
!
if (fromset_expand != NULL)
{
first = NEW_STEP (fromset_expand, 0, 0, NULL, NULL);
***************
*** 373,388 ****
searching for prefixes. So we search for the first entry with a
matching prefix and any other matching entry can be found from
this place. */
! struct gconv_module *node = __gconv_modules_db;
/* Maybe it is not necessary anymore to look for a solution for
! this entry since the cost is already as high (or heigher) as
the cost for the best solution so far. */
if (current->cost_hi > best_cost_hi
|| (current->cost_hi == best_cost_hi
&& current->cost_lo >= best_cost_lo))
continue;
while (node != NULL)
{
int cmpres = strcmp (current->result_set, node->from_string);
--- 420,436 ----
searching for prefixes. So we search for the first entry with a
matching prefix and any other matching entry can be found from
this place. */
! struct gconv_module *node;
/* Maybe it is not necessary anymore to look for a solution for
! this entry since the cost is already as high (or higher) as
the cost for the best solution so far. */
if (current->cost_hi > best_cost_hi
|| (current->cost_hi == best_cost_hi
&& current->cost_lo >= best_cost_lo))
continue;
+ node = __gconv_modules_db;
while (node != NULL)
{
int cmpres = strcmp (current->result_set, node->from_string);
***************
*** 404,440 ****
struct derivation_step *step;
/* We managed to find a derivation. First see whether
! this is what we are looking for. */
if (strcmp (result_set, toset) == 0
|| (toset_expand != NULL
&& strcmp (result_set, toset_expand) == 0))
{
! if (solution == NULL || cost_hi < best_cost_hi
|| (cost_hi == best_cost_hi
&& cost_lo < best_cost_lo))
{
best_cost_hi = cost_hi;
best_cost_lo = cost_lo;
}
-
- /* Append this solution to list. */
- if (solution == NULL)
- solution = NEW_STEP (result_set, 0, 0, runp, current);
- else
- {
- while (solution->next != NULL)
- solution = solution->next;
-
- solution->next = NEW_STEP (result_set, 0, 0,
- runp, current);
- }
}
else if (cost_hi < best_cost_hi
|| (cost_hi == best_cost_hi
&& cost_lo < best_cost_lo))
{
! /* Append at the end if there is no entry with
! this name. */
for (step = first; step != NULL; step = step->next)
if (strcmp (result_set, step->result_set) == 0)
break;
--- 452,503 ----
struct derivation_step *step;
/* We managed to find a derivation. First see whether
! we have reached one of the goal nodes. */
if (strcmp (result_set, toset) == 0
|| (toset_expand != NULL
&& strcmp (result_set, toset_expand) == 0))
{
! /* Append to the `solution' list if there
! is no entry with this name. */
! for (step = solution; step != NULL; step = step->next)
! if (strcmp (result_set, step->result_set) == 0)
! break;
!
! if (step == NULL)
! {
! step = NEW_STEP (result_set,
! cost_hi, cost_lo,
! runp, current);
! step->next = solution;
! solution = step;
! }
! else if (step->cost_hi > cost_hi
! || (step->cost_hi == cost_hi
! && step->cost_lo > cost_lo))
! {
! /* A better path was found for the node,
! on the `solution' list. */
! step->code = runp;
! step->last = current;
! step->cost_hi = cost_hi;
! step->cost_lo = cost_lo;
! }
!
! /* Update best_cost accordingly. */
! if (cost_hi < best_cost_hi
|| (cost_hi == best_cost_hi
&& cost_lo < best_cost_lo))
{
best_cost_hi = cost_hi;
best_cost_lo = cost_lo;
}
}
else if (cost_hi < best_cost_hi
|| (cost_hi == best_cost_hi
&& cost_lo < best_cost_lo))
{
! /* Append at the end of the `first' list if there
! is no entry with this name. */
for (step = first; step != NULL; step = step->next)
if (strcmp (result_set, step->result_set) == 0)
break;
***************
*** 450,480 ****
|| (step->cost_hi == cost_hi
&& step->cost_lo > cost_lo))
{
step->code = runp;
step->last = current;
/* Update the cost for all steps. */
for (step = first; step != NULL;
step = step->next)
! {
! struct derivation_step *back;
!
! if (step->code == NULL)
! /* This is one of the entries we started
! from. */
! continue;
!
! step->cost_hi = step->code->cost_hi;
! step->cost_lo = step->code->cost_lo;
!
! for (back = step->last; back->code != NULL;
! back = back->last)
! {
! step->cost_hi += back->code->cost_hi;
! step->cost_lo += back->code->cost_lo;
! }
! }
for (step = solution; step != NULL;
step = step->next)
{
--- 513,548 ----
|| (step->cost_hi == cost_hi
&& step->cost_lo > cost_lo))
{
+ /* A better path was found for the node,
+ on the `first' list. */
step->code = runp;
step->last = current;
/* Update the cost for all steps. */
for (step = first; step != NULL;
step = step->next)
! /* But don't update the start nodes. */
! if (step->code != NULL)
! {
! struct derivation_step *back;
! int hi, lo;
!
! hi = step->code->cost_hi;
! lo = step->code->cost_lo;
!
! for (back = step->last; back->code != NULL;
! back = back->last)
! {
! hi += back->code->cost_hi;
! lo += back->code->cost_lo;
! }
!
! step->cost_hi = hi;
! step->cost_lo = lo;
! }
+ /* Likewise for the nodes on the solution list.
+ Also update best_cost accordingly. */
for (step = solution; step != NULL;
step = step->next)
{
***************
*** 487,493 ****
|| (step->cost_hi == best_cost_hi
&& step->cost_lo < best_cost_lo))
{
- solution = step;
best_cost_hi = step->cost_hi;
best_cost_lo = step->cost_lo;
}
--- 555,560 ----
***************
*** 509,518 ****
}
if (solution != NULL)
! /* We really found a way to do the transformation. Now build a data
! structure describing the transformation steps.*/
! result = gen_steps (solution, toset_expand ?: toset,
! fromset_expand ?: fromset, handle, nsteps);
else
{
/* We haven't found a transformation. Clear the result values. */
--- 576,601 ----
}
if (solution != NULL)
! {
! /* We really found a way to do the transformation. */
!
! /* Choose the best solution. This is easy because we know that
! the solution list has at most length 2 (one for every possible
! goal node). */
! if (solution->next != NULL)
! {
! struct derivation_step *solution2 = solution->next;
!
! if (solution2->cost_hi < solution->cost_hi
! || (solution2->cost_hi == solution->cost_hi
! && solution2->cost_lo < solution->cost_lo))
! solution = solution2;
! }
!
! /* Now build a data structure describing the transformation steps. */
! result = gen_steps (solution, toset_expand ?: toset,
! fromset_expand ?: fromset, handle, nsteps);
! }
else
{
/* We haven't found a transformation. Clear the result values. */