This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

iconv's find_derivation



The find_derivation function which finds intermediate encodings in order to
convert from one encoding to another one is buggy: it does not always return
the best path (i.e. the one with the least cost sum).

It would be correct if there was no distinction between `toset' and
`toset_expand'; in this case all solution paths would end in `toset' and
only the best path to `toset' would have to be kept in the `solution'
variable.

But the current code can, through the
         solution = step;
assignment, produce a non-optimal solution if:
   1. a path P1 to `toset' is found,
   2. a path P2 to `toset_expand' is found,
   then the `solution' list contains P1 followed by P2,
   3. a better path to an intermediate node in P2 is found, such
      that P2 becomes better than P1, then P1 is removed off the list,
   4. a better path to an intermediate node N in P1 is found.
P1 has already been removed off the `solution' list and won't be re-added
(because the algorithm searches all paths outgoing from N only once).

Two other things that don't make sense about the current code are:

- New solutions are always added to the `solution' list, thus making the
  list grow more than needed. The same trick as for the `first' list
  could be applied: if the same charset has already been added to the
  `solution' list, its derivation_step can be reused.

- It always adds new solutions at the end of the list, but the `solution'
  list can be considered an ordered one - thus adding to the front is faster.

Here is a patch which

- fixes the algorithm,

- adds comments about how the algorithm works and why the `solution' list
  contains at most 2 elements,

- at the end, chooses the cheaper among the 2 elements.

Bruno


2000-09-03  Bruno Haible  <haible@clisp.cons.org>

	* iconv/gconv_db.c (find_derivation): Always use the least-cost
	solution.

*** glibc-20000831/iconv/gconv_db.c.bak	Sat Sep  2 19:35:55 2000
--- glibc-20000831/iconv/gconv_db.c	Sun Sep  3 02:11:33 2000
***************
*** 333,341 ****
    int best_cost_lo = INT_MAX;
    int result;
  
!   /* There is a small chance that this derivation is meanwhile found.  This
!      can happen if in `find_derivation' we look for this derivation, didn't
!      find it but at the same time another thread looked for this derivation. */
    result = derivation_lookup (fromset_expand ?: fromset, toset_expand ?: toset,
  			      handle, nsteps);
    if (result == __GCONV_OK)
--- 358,365 ----
    int best_cost_lo = INT_MAX;
    int result;
  
!   /* Look whether an earlier call to `find_derivation' has already
!      computed a possible derivation.  If so, return it immediately.  */
    result = derivation_lookup (fromset_expand ?: fromset, toset_expand ?: toset,
  			      handle, nsteps);
    if (result == __GCONV_OK)
***************
*** 346,354 ****
        return result;
      }
  
!   /* For now we use a simple algorithm with quadratic runtime behaviour.
!      The task is to match the `toset' with any of the available rules,
!      starting from FROMSET.  */
    if (fromset_expand != NULL)
      {
        first = NEW_STEP (fromset_expand, 0, 0, NULL, NULL);
--- 370,401 ----
        return result;
      }
  
!   /* The task is to find a sequence of transformations, backed by the
!      existing modules - whether builtin or dynamically loadable -,
!      starting at `fromset' (or `fromset_expand') and ending at `toset'
!      (or `toset_expand'), and with minimal cost.
! 
!      For computer scientists, this is a shortest path search in the
!      graph where the nodes are all possible charsets and the edges are
!      the transformations listed in __gconv_modules_db.
! 
!      For now we use a simple algorithm with quadratic runtime behaviour.
!      A breadth-first search, starting at `fromset' and `fromset_expand'.
!      The list starting at `first' contains all nodes that have been
!      visited up to now, in the order in which they have been visited --
!      excluding the goal nodes `toset' and `toset_expand' which get
!      managed in the list starting at `solution'.
!      `current' walks through the list starting at `first' and looks
!      which nodes are reachable from the current node, adding them to
!      the end of the list [`first' or `solution' respectively] (if
!      they are visited the first time) or updating them in place (if
!      they have have already been visited).
!      In each node of either list, cost_lo and cost_hi contain the
!      minimum cost over any paths found up to now, starting at `fromset'
!      or `fromset_expand', ending at that node.  best_cost_lo and
!      best_cost_hi represent the minimum over the elements of the
!      `solution' list.  */
! 
    if (fromset_expand != NULL)
      {
        first = NEW_STEP (fromset_expand, 0, 0, NULL, NULL);
***************
*** 373,388 ****
  	 searching for prefixes.  So we search for the first entry with a
  	 matching prefix and any other matching entry can be found from
  	 this place.  */
!       struct gconv_module *node = __gconv_modules_db;
  
        /* Maybe it is not necessary anymore to look for a solution for
! 	 this entry since the cost is already as high (or heigher) as
  	 the cost for the best solution so far.  */
        if (current->cost_hi > best_cost_hi
  	  || (current->cost_hi == best_cost_hi
  	      && current->cost_lo >= best_cost_lo))
  	continue;
  
        while (node != NULL)
  	{
  	  int cmpres = strcmp (current->result_set, node->from_string);
--- 420,436 ----
  	 searching for prefixes.  So we search for the first entry with a
  	 matching prefix and any other matching entry can be found from
  	 this place.  */
!       struct gconv_module *node;
  
        /* Maybe it is not necessary anymore to look for a solution for
! 	 this entry since the cost is already as high (or higher) as
  	 the cost for the best solution so far.  */
        if (current->cost_hi > best_cost_hi
  	  || (current->cost_hi == best_cost_hi
  	      && current->cost_lo >= best_cost_lo))
  	continue;
  
+       node = __gconv_modules_db;
        while (node != NULL)
  	{
  	  int cmpres = strcmp (current->result_set, node->from_string);
***************
*** 404,440 ****
  		  struct derivation_step *step;
  
  		  /* We managed to find a derivation.  First see whether
! 		     this is what we are looking for.  */
  		  if (strcmp (result_set, toset) == 0
  		      || (toset_expand != NULL
  			  && strcmp (result_set, toset_expand) == 0))
  		    {
! 		      if (solution == NULL || cost_hi < best_cost_hi
  			  || (cost_hi == best_cost_hi
  			      && cost_lo < best_cost_lo))
  			{
  			  best_cost_hi = cost_hi;
  			  best_cost_lo = cost_lo;
  			}
- 
- 		      /* Append this solution to list.  */
- 		      if (solution == NULL)
- 			solution = NEW_STEP (result_set, 0, 0, runp, current);
- 		      else
- 			{
- 			  while (solution->next != NULL)
- 			    solution = solution->next;
- 
- 			  solution->next = NEW_STEP (result_set, 0, 0,
- 						     runp, current);
- 			}
  		    }
  		  else if (cost_hi < best_cost_hi
  			   || (cost_hi == best_cost_hi
  			       && cost_lo < best_cost_lo))
  		    {
! 		      /* Append at the end if there is no entry with
! 			 this name.  */
  		      for (step = first; step != NULL; step = step->next)
  			if (strcmp (result_set, step->result_set) == 0)
  			  break;
--- 452,503 ----
  		  struct derivation_step *step;
  
  		  /* We managed to find a derivation.  First see whether
! 		     we have reached one of the goal nodes.  */
  		  if (strcmp (result_set, toset) == 0
  		      || (toset_expand != NULL
  			  && strcmp (result_set, toset_expand) == 0))
  		    {
! 		      /* Append to the `solution' list if there
! 			 is no entry with this name.  */
! 		      for (step = solution; step != NULL; step = step->next)
! 			if (strcmp (result_set, step->result_set) == 0)
! 			  break;
! 
! 		      if (step == NULL)
! 			{
! 			  step = NEW_STEP (result_set,
! 					   cost_hi, cost_lo,
! 					   runp, current);
! 			  step->next = solution;
! 			  solution = step;
! 			}
! 		      else if (step->cost_hi > cost_hi
! 			       || (step->cost_hi == cost_hi
! 				   && step->cost_lo > cost_lo))
! 			{
! 			  /* A better path was found for the node,
! 			     on the `solution' list.  */
! 			  step->code = runp;
! 			  step->last = current;
! 			  step->cost_hi = cost_hi;
! 			  step->cost_lo = cost_lo;
! 			}
! 
! 		      /* Update best_cost accordingly.  */
! 		      if (cost_hi < best_cost_hi
  			  || (cost_hi == best_cost_hi
  			      && cost_lo < best_cost_lo))
  			{
  			  best_cost_hi = cost_hi;
  			  best_cost_lo = cost_lo;
  			}
  		    }
  		  else if (cost_hi < best_cost_hi
  			   || (cost_hi == best_cost_hi
  			       && cost_lo < best_cost_lo))
  		    {
! 		      /* Append at the end of the `first' list if there
! 			 is no entry with this name.  */
  		      for (step = first; step != NULL; step = step->next)
  			if (strcmp (result_set, step->result_set) == 0)
  			  break;
***************
*** 450,480 ****
  			       || (step->cost_hi == cost_hi
  				   && step->cost_lo > cost_lo))
  			{
  			  step->code = runp;
  			  step->last = current;
  
  			  /* Update the cost for all steps.  */
  			  for (step = first; step != NULL;
  			       step = step->next)
! 			    {
! 			      struct derivation_step *back;
! 
! 			      if (step->code == NULL)
! 				/* This is one of the entries we started
! 				   from.  */
! 				continue;
! 
! 			      step->cost_hi = step->code->cost_hi;
! 			      step->cost_lo = step->code->cost_lo;
! 
! 			      for (back = step->last; back->code != NULL;
! 				   back = back->last)
! 				{
! 				  step->cost_hi += back->code->cost_hi;
! 				  step->cost_lo += back->code->cost_lo;
! 				}
! 			    }
  
  			  for (step = solution; step != NULL;
  			       step = step->next)
  			    {
--- 513,548 ----
  			       || (step->cost_hi == cost_hi
  				   && step->cost_lo > cost_lo))
  			{
+ 			  /* A better path was found for the node,
+ 			     on the `first' list.  */
  			  step->code = runp;
  			  step->last = current;
  
  			  /* Update the cost for all steps.  */
  			  for (step = first; step != NULL;
  			       step = step->next)
! 			    /* But don't update the start nodes.  */
! 			    if (step->code != NULL)
! 			      {
! 				struct derivation_step *back;
! 				int hi, lo;
! 
! 				hi = step->code->cost_hi;
! 				lo = step->code->cost_lo;
! 
! 				for (back = step->last; back->code != NULL;
! 				     back = back->last)
! 				  {
! 				    hi += back->code->cost_hi;
! 				    lo += back->code->cost_lo;
! 				  }
! 
! 				step->cost_hi = hi;
! 				step->cost_lo = lo;
! 			      }
  
+ 			  /* Likewise for the nodes on the solution list.
+ 			     Also update best_cost accordingly.  */
  			  for (step = solution; step != NULL;
  			       step = step->next)
  			    {
***************
*** 487,493 ****
  				  || (step->cost_hi == best_cost_hi
  				      && step->cost_lo < best_cost_lo))
  				{
- 				  solution = step;
  				  best_cost_hi = step->cost_hi;
  				  best_cost_lo = step->cost_lo;
  				}
--- 555,560 ----
***************
*** 509,518 ****
      }
  
    if (solution != NULL)
!     /* We really found a way to do the transformation.  Now build a data
!        structure describing the transformation steps.*/
!     result = gen_steps (solution, toset_expand ?: toset,
! 			fromset_expand ?: fromset, handle, nsteps);
    else
      {
        /* We haven't found a transformation.  Clear the result values.  */
--- 576,601 ----
      }
  
    if (solution != NULL)
!     {
!       /* We really found a way to do the transformation.  */
! 
!       /* Choose the best solution.  This is easy because we know that
! 	 the solution list has at most length 2 (one for every possible
! 	 goal node).  */
!       if (solution->next != NULL)
! 	{
! 	  struct derivation_step *solution2 = solution->next;
! 
! 	  if (solution2->cost_hi < solution->cost_hi
! 	      || (solution2->cost_hi == solution->cost_hi
! 		  && solution2->cost_lo < solution->cost_lo))
! 	    solution = solution2;
! 	}
! 
!       /* Now build a data structure describing the transformation steps.  */
!       result = gen_steps (solution, toset_expand ?: toset,
! 			  fromset_expand ?: fromset, handle, nsteps);
!     }
    else
      {
        /* We haven't found a transformation.  Clear the result values.  */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]