This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC/RFA] redesign construction of the automaton for a regex


This is a very big patch that redesigns regcomp.c's automaton creation. The patch makes the parse tree a substantially autonomous entity, and delays the creation of the automaton until after the parsing is complete.

I was careful to limit recursion, since the parse tree is completely unbalanced. It could be possible to balance trees because CONCAT and OP_ALT are associative. However, the parse tree has parent pointers so recursion on CONCAT nodes can be completely removed from regcomp.c. My regcomp.c includes two generic preorder and postorder visitors and uses them to perform the various phases of building the automaton. I also rewrote duplicate_tree to avoid recursion, since using the generic visitors was unnecessarily complicated in that case.

As an example of the cleanups that are enabled by this representation, Jakub's subexpression elimination optimization can be done in 20 lines of code using the visitors and the new parse tree design. Now, subexpressions are represented as a single SUBEXP node in the parse tree, and are lowered later to concatenations with OP_OPEN_SUBEXP and OP_CLOSE_SUBEXP.

More efficient construction of the NFA automaton is also possible. An example of a pass that can be implemented (I did not do this yet) is "star-normal form", eliminating cycles made entirely of epsilon transitions which are handled specially in regexec.c.

While I was working on automaton construction, I removed OP_DUP_QUESTION from trees and automata, since "A?" and "A|" are actually the same regular expression and they were being treated equivalently by regcomp.c.

This patch relies on the "speed up OP_PERIOD patch", because it included a cleanup that is very useful now (avoiding to muck with already created DFA nodes). I can backport that part, but would of course prefer that that patch be included in the master version.

This patch is pretty hard to split, also because it has a lot of code that is written from scratch. I've tested it on powerpc-linux (glibc+sed testsuites) and powerpc-darwin (sed only). I can test it on i686-linux and sparc-solaris if you want.

Please comment on the possibility that this patch is applied, as the possibility of contributing to glibc some of my (paid) work on regex depends on this.

Regards,

Paolo

2004-12-13  Paolo Bonzini  <bonzini@gnu.org>

	Separate parsing and creation of the NFA.  Avoided recursion on
	the (very unbalanced) parse tree.
	
	* regcomp.c (struct subexp_optimize, analyze_tree, calc_epsdest,
	re_dfa_add_tree_node, mark_opt_subexp_iter): Removed.
	(optimize_subexps, duplicate_tree, calc_first, calc_next,
	mark_opt_subexp): Rewritten.
	(preorder, postorder, lower_subexps, lower_subexp, link_nfa_nodes,
	create_token_tree, free_tree, free_token): New.
	(analyze): Accept a regex_t *.  Invoke the passes via the preorder and
	postorder generic visitors.  Do not initialize the fields in the
	re_dfa_t that represent the transitions.
	(free_dfa_content): Use free_token.
	(re_compile_internal): Analyze before UTF-8 optimizations.  Do not
	include optimization of subexpressions.
	(create_initial_state): Fetch the DFA node index from the first node's
	bin_tree_t *.
	(optimize_utf8): Abort on unexpected nodes, including OP_DUP_QUESTION.
	Return on COMPLEX_BRACKET.
	(duplicate_node_closure): Fix comment.
	(duplicate_node): Do not initialize the fields in the
	re_dfa_t that represent the transitions.
	(calc_eclosure, calc_inveclosure): Do not handle OP_DELETED_SUBEXP.
	(create_tree): Remove final argument.  All callers adjusted.  Rewritten
	to use create_token_tree.
	(parse_reg_exp, parse_branch, parse_expression, parse_bracket_exp,
	build_charclass_op): Use create_tree or create_token_tree instead
	of re_dfa_add_tree_node.
	(parse_dup_op): Likewise.  Also free the tree using free_tree for
	"<re>{0}", and lower OP_DUP_QUESTION to OP_ALT: "a?" is equivalent
	to "a|".  Adjust invocation of mark_opt_subexp.
	(parse_sub_exp): Create a single SUBEXP node.
	* regex_internal.c (re_dfa_add_node): Remove last parameter, always
	perform as if it was 1.  Do not initialize OPT_SUBEXP and DUPLICATED,
	and initialize the DFA fields representing the transitions.
	* regex_internal.h (re_dfa_add_node): Adjust prototype.
	(re_token_type_t): Move OP_DUP_PLUS and OP_DUP_QUESTION to the tokens
	section.  Add a tree-only code SUBEXP.  Remove OP_DELETED_SUBEXP.
	(bin_tree_t): Include a full re_token_t for TOKEN.  Turn FIRST and
	NEXT into pointers to trees.  Remove ECLOSURE.

*** orig/lib/regcomp.c
--- mod/lib/regcomp.c
***************
*** 33,51 ****
  #ifdef RE_ENABLE_I18N
  static void optimize_utf8 (re_dfa_t *dfa);
  #endif
! struct subexp_optimize
! {
!   re_dfa_t *dfa;
!   re_token_t *nodes;
!   int no_sub, re_nsub;
! };
! static bin_tree_t *optimize_subexps (struct subexp_optimize *so,
!                                      bin_tree_t *node, int sidx, int depth);
! static reg_errcode_t analyze (re_dfa_t *dfa);
! static reg_errcode_t analyze_tree (re_dfa_t *dfa, bin_tree_t *node);
! static void calc_first (re_dfa_t *dfa, bin_tree_t *node);
! static void calc_next (re_dfa_t *dfa, bin_tree_t *node);
! static void calc_epsdest (re_dfa_t *dfa, bin_tree_t *node);
  static reg_errcode_t duplicate_node_closure (re_dfa_t *dfa, int top_org_node,
  					     int top_clone_node, int root_node,
  					     unsigned int constraint);
--- 33,53 ----
  #ifdef RE_ENABLE_I18N
  static void optimize_utf8 (re_dfa_t *dfa);
  #endif
! static reg_errcode_t analyze (regex_t *preg);
! static reg_errcode_t create_initial_state (re_dfa_t *dfa);
! static reg_errcode_t preorder (bin_tree_t *root,
! 			       reg_errcode_t (fn (void *, bin_tree_t *)),
! 			       void *extra);
! static reg_errcode_t postorder (bin_tree_t *root,
! 				reg_errcode_t (fn (void *, bin_tree_t *)),
! 				void *extra);
! static reg_errcode_t optimize_subexps (void *extra, bin_tree_t *node);
! static reg_errcode_t lower_subexps (void *extra, bin_tree_t *node);
! static bin_tree_t *lower_subexp (reg_errcode_t *err, regex_t *preg, 
! 				 bin_tree_t *node);
! static reg_errcode_t calc_first (void *extra, bin_tree_t *node);
! static reg_errcode_t calc_next (void *extra, bin_tree_t *node);
! static reg_errcode_t link_nfa_nodes (void *extra, bin_tree_t *node);
  static reg_errcode_t duplicate_node_closure (re_dfa_t *dfa, int top_org_node,
  					     int top_clone_node, int root_node,
  					     unsigned int constraint);
***************
*** 138,151 ****
  				       int non_match, reg_errcode_t *err);
  static bin_tree_t *create_tree (re_dfa_t *dfa,
  				bin_tree_t *left, bin_tree_t *right,
! 				re_token_type_t type, int index);
! static bin_tree_t *re_dfa_add_tree_node (re_dfa_t *dfa,
! 					 bin_tree_t *left, bin_tree_t *right,
! 					 const re_token_t *token)
!   __attribute ((noinline));
  static bin_tree_t *duplicate_tree (const bin_tree_t *src, re_dfa_t *dfa);
! static void mark_opt_subexp (const bin_tree_t *src, re_dfa_t *dfa);
! static void mark_opt_subexp_iter (const bin_tree_t *src, re_dfa_t *dfa, int idx);
  
  /* This table gives an error message for each of the error codes listed
     in regex.h.  Obviously the order here has to be same as there.
--- 139,152 ----
  				       int non_match, reg_errcode_t *err);
  static bin_tree_t *create_tree (re_dfa_t *dfa,
  				bin_tree_t *left, bin_tree_t *right,
! 				re_token_type_t type);
! static bin_tree_t *create_token_tree (re_dfa_t *dfa,
! 				      bin_tree_t *left, bin_tree_t *right,
! 				      const re_token_t *token);
  static bin_tree_t *duplicate_tree (const bin_tree_t *src, re_dfa_t *dfa);
! static void free_token (re_token_t *node);
! static reg_errcode_t free_tree (void *extra, bin_tree_t *node);
! static reg_errcode_t mark_opt_subexp (void *extra, bin_tree_t *node);
  
  /* This table gives an error message for each of the error codes listed
     in regex.h.  Obviously the order here has to be same as there.
***************
*** 598,613 ****
  
    if (dfa->nodes)
      for (i = 0; i < dfa->nodes_len; ++i)
!       {
! 	re_token_t *node = dfa->nodes + i;
! #ifdef RE_ENABLE_I18N
! 	if (node->type == COMPLEX_BRACKET && node->duplicated == 0)
! 	  free_charset (node->opr.mbcset);
! 	else
! #endif /* RE_ENABLE_I18N */
! 	  if (node->type == SIMPLE_BRACKET && node->duplicated == 0)
! 	    re_free (node->opr.sbcset);
!       }
    re_free (dfa->nexts);
    for (i = 0; i < dfa->nodes_len; ++i)
      {
--- 599,605 ----
  
    if (dfa->nodes)
      for (i = 0; i < dfa->nodes_len; ++i)
!       free_token (dfa->nodes + i);
    re_free (dfa->nexts);
    for (i = 0; i < dfa->nodes_len; ++i)
      {
***************
*** 811,839 ****
    if (BE (dfa->str_tree == NULL, 0))
      goto re_compile_internal_free_return;
  
  #ifdef RE_ENABLE_I18N
    /* If possible, do searching in single byte encoding to speed things up.  */
    if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL)
      optimize_utf8 (dfa);
  #endif
  
-   if (preg->re_nsub > 0)
-     {
-       struct subexp_optimize so;
- 
-       so.dfa = dfa;
-       so.nodes = dfa->nodes;
-       so.no_sub = preg->no_sub;
-       so.re_nsub = preg->re_nsub;
-       dfa->str_tree = optimize_subexps (&so, dfa->str_tree, -1, 0);
-     }
- 
-   /* Analyze the tree and collect information which is necessary to
-      create the dfa.  */
-   err = analyze (dfa);
-   if (BE (err != REG_NOERROR, 0))
-     goto re_compile_internal_free_return;
- 
    /* Then create the initial state of the dfa.  */
    err = create_initial_state (dfa);
  
--- 803,819 ----
    if (BE (dfa->str_tree == NULL, 0))
      goto re_compile_internal_free_return;
  
+   /* Analyze the tree and create the nfa.  */
+   err = analyze (preg);
+   if (BE (err != REG_NOERROR, 0))
+     goto re_compile_internal_free_return;
+ 
  #ifdef RE_ENABLE_I18N
    /* If possible, do searching in single byte encoding to speed things up.  */
    if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL)
      optimize_utf8 (dfa);
  #endif
  
    /* Then create the initial state of the dfa.  */
    err = create_initial_state (dfa);
  
***************
*** 998,1004 ****
  
    /* Initial states have the epsilon closure of the node which is
       the first node of the regular expression.  */
!   first = dfa->str_tree->first;
    dfa->init_node = first;
    err = re_node_set_init_copy (&init_nodes, dfa->eclosures + first);
    if (BE (err != REG_NOERROR, 0))
--- 978,984 ----
  
    /* Initial states have the epsilon closure of the node which is
       the first node of the regular expression.  */
!   first = dfa->str_tree->first->node_idx;
    dfa->init_node = first;
    err = re_node_set_init_copy (&init_nodes, dfa->eclosures + first);
    if (BE (err != REG_NOERROR, 0))
***************
*** 1104,1113 ****
        case OP_ALT:
        case END_OF_RE:
        case OP_DUP_ASTERISK:
-       case OP_DUP_QUESTION:
        case OP_OPEN_SUBEXP:
        case OP_CLOSE_SUBEXP:
  	break;
        case SIMPLE_BRACKET:
  	/* Just double check.  */
          for (i = 0x80 / UINT_BITS; i < BITSET_UINTS; ++i)
--- 1084,1094 ----
        case OP_ALT:
        case END_OF_RE:
        case OP_DUP_ASTERISK:
        case OP_OPEN_SUBEXP:
        case OP_CLOSE_SUBEXP:
  	break;
+       case COMPLEX_BRACKET:
+ 	return;
        case SIMPLE_BRACKET:
  	/* Just double check.  */
          for (i = 0x80 / UINT_BITS; i < BITSET_UINTS; ++i)
***************
*** 1115,1121 ****
  	    return;
  	break;
        default:
! 	return;
        }
  
    if (mb_chars || has_period)
--- 1096,1102 ----
  	    return;
  	break;
        default:
! 	abort ();
        }
  
    if (mb_chars || has_period)
***************
*** 1135,1224 ****
  }
  #endif
  
- static bin_tree_t *
- optimize_subexps (so, node, sidx, depth)
-      struct subexp_optimize *so;
-      bin_tree_t *node;
-      int sidx, depth;
- {
-   int idx, new_depth, new_sidx;
-   bin_tree_t *ret;
-   if (node == NULL)
-     return NULL;
- 
-   new_depth = 0;
-   new_sidx = sidx;
-   if ((depth & 1) && node->type == CONCAT
-       && node->right && node->right->type == 0
-       && so->nodes[idx = node->right->node_idx].type == OP_CLOSE_SUBEXP)
-     {
-       new_depth = depth + 1;
-       if (new_depth == 2
-           || (so->nodes[idx].opr.idx < 8 * sizeof (so->dfa->used_bkref_map)
-               && so->dfa->used_bkref_map & (1 << so->nodes[idx].opr.idx)))
-         new_sidx = so->nodes[idx].opr.idx;
-     }
-   node->left = optimize_subexps (so, node->left, new_sidx, new_depth);
-   new_depth = (depth & 1) == 0 && node->type == CONCAT
-               && node->left && node->left->type == 0
-               && so->nodes[node->left->node_idx].type == OP_OPEN_SUBEXP
-               ? depth + 1 : 0;
-   node->right = optimize_subexps (so, node->right, sidx, new_depth);
-                                      
-   if (node->type != CONCAT)
-     return node;
-   if ((depth & 1) == 0
-       && node->left
-       && node->left->type == 0
-       && so->nodes[idx = node->left->node_idx].type == OP_OPEN_SUBEXP)
-     ret = node->right;
-   else if ((depth & 1)
-            && node->right
-            && node->right->type == 0
-            && so->nodes[idx = node->right->node_idx].type == OP_CLOSE_SUBEXP)
-     ret = node->left;
-   else
-     return node;
- 
-   if (so->nodes[idx].opr.idx < 8 * sizeof (so->dfa->used_bkref_map)
-       && so->dfa->used_bkref_map & (1 << so->nodes[idx].opr.idx))
-     return node;
- 
-   if (!so->no_sub)
-     {
-       int i;
- 
-       if (depth < 2)
-         return node;
- 
-       if (so->dfa->subexp_map == NULL)
-         {
-           so->dfa->subexp_map = re_malloc (int, so->re_nsub);
-           if (so->dfa->subexp_map == NULL)
-             return node;
- 
-           for (i = 0; i < so->re_nsub; i++)
-             so->dfa->subexp_map[i] = i;
-         }
- 
-       i = so->nodes[idx].opr.idx;
-       assert (sidx < i);
-       so->dfa->subexp_map[i] = sidx;
-     }
- 
-   so->nodes[idx].type = OP_DELETED_SUBEXP;
-   ret->parent = node->parent;
-   return ret;
- }
- 
  /* Analyze the structure tree, and calculate "first", "next", "edest",
     "eclosure", and "inveclosure".  */
  
  static reg_errcode_t
! analyze (dfa)
!      re_dfa_t *dfa;
  {
!   int i;
    reg_errcode_t ret;
  
    /* Allocate arrays.  */
--- 1116,1129 ----
  }
  #endif
  
  /* Analyze the structure tree, and calculate "first", "next", "edest",
     "eclosure", and "inveclosure".  */
  
  static reg_errcode_t
! analyze (preg)
!      regex_t *preg;
  {
!   re_dfa_t *dfa = (re_dfa_t *) preg->buffer;
    reg_errcode_t ret;
  
    /* Allocate arrays.  */
***************
*** 1230,1450 ****
    if (BE (dfa->nexts == NULL || dfa->org_indices == NULL || dfa->edests == NULL
  	  || dfa->eclosures == NULL || dfa->inveclosures == NULL, 0))
      return REG_ESPACE;
-   /* Initialize them.  */
-   for (i = 0; i < dfa->nodes_len; ++i)
-     {
-       dfa->nexts[i] = -1;
-       re_node_set_init_empty (dfa->edests + i);
-       re_node_set_init_empty (dfa->eclosures + i);
-       re_node_set_init_empty (dfa->inveclosures + i);
-     }
  
!   ret = analyze_tree (dfa, dfa->str_tree);
!   if (BE (ret == REG_NOERROR, 1))
      {
!       ret = calc_eclosure (dfa);
!       if (ret == REG_NOERROR)
! 	calc_inveclosure (dfa);
      }
    return ret;
  }
  
! /* Helper functions for analyze.
!    This function calculate "first", "next", and "edest" for the subtree
!    whose root is NODE.  */
  
  static reg_errcode_t
! analyze_tree (dfa, node)
!      re_dfa_t *dfa;
       bin_tree_t *node;
  {
!   reg_errcode_t ret;
!   if (node->first == -1)
!     calc_first (dfa, node);
!   if (node->next == -1)
!     calc_next (dfa, node);
!   calc_epsdest (dfa, node);
  
!   /* Calculate "first" etc. for the left child.  */
!   if (node->left != NULL)
      {
!       ret = analyze_tree (dfa, node->left);
!       if (BE (ret != REG_NOERROR, 0))
! 	return ret;
      }
!   /* Calculate "first" etc. for the right child.  */
!   if (node->right != NULL)
      {
!       ret = analyze_tree (dfa, node->right);
!       if (BE (ret != REG_NOERROR, 0))
! 	return ret;
      }
    return REG_NOERROR;
  }
  
! /* Calculate "first" for the node NODE.  */
! static void
! calc_first (dfa, node)
!      re_dfa_t *dfa;
       bin_tree_t *node;
  {
!   int idx, type;
!   idx = node->node_idx;
!   type = (node->type == 0) ? dfa->nodes[idx].type : node->type;
  
!   switch (type)
      {
! #ifdef DEBUG
!     case OP_OPEN_BRACKET:
!     case OP_CLOSE_BRACKET:
!     case OP_OPEN_DUP_NUM:
!     case OP_CLOSE_DUP_NUM:
!     case OP_DUP_PLUS:
!     case OP_NON_MATCH_LIST:
!     case OP_OPEN_COLL_ELEM:
!     case OP_CLOSE_COLL_ELEM:
!     case OP_OPEN_EQUIV_CLASS:
!     case OP_CLOSE_EQUIV_CLASS:
!     case OP_OPEN_CHAR_CLASS:
!     case OP_CLOSE_CHAR_CLASS:
!       /* These must not appear here.  */
!       assert (0);
! #endif
!     case END_OF_RE:
!     case CHARACTER:
!     case OP_PERIOD:
!     case OP_DUP_ASTERISK:
!     case OP_DUP_QUESTION:
! #ifdef RE_ENABLE_I18N
!     case OP_UTF8_PERIOD:
!     case COMPLEX_BRACKET:
! #endif /* RE_ENABLE_I18N */
!     case SIMPLE_BRACKET:
!     case OP_BACK_REF:
!     case ANCHOR:
!     case OP_OPEN_SUBEXP:
!     case OP_CLOSE_SUBEXP:
!       node->first = idx;
!       break;
!     case OP_ALT:
!       node->first = idx;
!       break;
!       /* else fall through */
!     default:
! #ifdef DEBUG
!       assert (node->left != NULL);
! #endif
!       if (node->left->first == -1)
! 	calc_first (dfa, node->left);
!       node->first = node->left->first;
!       break;
      }
  }
  
! /* Calculate "next" for the node NODE.  */
  
! static void
! calc_next (dfa, node)
!      re_dfa_t *dfa;
       bin_tree_t *node;
  {
!   int idx, type;
!   bin_tree_t *parent = node->parent;
!   if (parent == NULL)
      {
!       node->next = -1;
!       idx = node->node_idx;
!       if (node->type == 0)
! 	dfa->nexts[idx] = node->next;
!       return;
      }
  
!   idx = parent->node_idx;
!   type = (parent->type == 0) ? dfa->nodes[idx].type : parent->type;
!  
!   switch (type)
      {
      case OP_DUP_ASTERISK:
!       node->next = idx;
        break;
      case CONCAT:
!       if (parent->left == node)
! 	{
! 	  if (parent->right->first == -1)
! 	    calc_first (dfa, parent->right);
! 	  node->next = parent->right->first;
! 	  break;
! 	}
!       /* else fall through */
      default:
!       if (parent->next == -1)
! 	calc_next (dfa, parent);
!       node->next = parent->next;
        break;
      }
!   idx = node->node_idx;
!   if (node->type == 0)
!     dfa->nexts[idx] = node->next;
  }
  
! /* Calculate "edest" for the node NODE.  */
! 
! static void
! calc_epsdest (dfa, node)
!      re_dfa_t *dfa;
       bin_tree_t *node;
  {
!   int idx;
!   idx = node->node_idx;
!   if (node->type == 0)
      {
!       if (dfa->nodes[idx].type == OP_DUP_ASTERISK
! 	  || dfa->nodes[idx].type == OP_DUP_QUESTION)
! 	{
! 	  if (node->left->first == -1)
! 	    calc_first (dfa, node->left);
! 	  if (node->next == -1)
! 	    calc_next (dfa, node);
! 	  re_node_set_init_2 (dfa->edests + idx, node->left->first,
! 			      node->next);
! 	}
!       else if (dfa->nodes[idx].type == OP_ALT)
! 	{
! 	  int left, right;
! 	  if (node->left != NULL)
! 	    {
! 	      if (node->left->first == -1)
! 		calc_first (dfa, node->left);
! 	      left = node->left->first;
! 	    }
! 	  else
! 	    {
! 	      if (node->next == -1)
! 		calc_next (dfa, node);
! 	      left = node->next;
! 	    }
! 	  if (node->right != NULL)
! 	    {
! 	      if (node->right->first == -1)
! 		calc_first (dfa, node->right);
! 	      right = node->right->first;
! 	    }
! 	  else
! 	    {
! 	      if (node->next == -1)
! 		calc_next (dfa, node);
! 	      right = node->next;
! 	    }
! 	  re_node_set_init_2 (dfa->edests + idx, left, right);
! 	}
!       else if (dfa->nodes[idx].type == ANCHOR
! 	       || dfa->nodes[idx].type == OP_OPEN_SUBEXP
! 	       || dfa->nodes[idx].type == OP_CLOSE_SUBEXP
! 	       || dfa->nodes[idx].type == OP_BACK_REF)
! 	re_node_set_init_1 (dfa->edests + idx, node->next);
!       else
!         assert (!IS_EPSILON_NODE (dfa->nodes[idx].type));
      }
  }
  
  /* Duplicate the epsilon closure of the node ROOT_NODE.
--- 1135,1441 ----
    if (BE (dfa->nexts == NULL || dfa->org_indices == NULL || dfa->edests == NULL
  	  || dfa->eclosures == NULL || dfa->inveclosures == NULL, 0))
      return REG_ESPACE;
  
!   dfa->subexp_map = re_malloc (int, preg->re_nsub);
!   if (dfa->subexp_map != NULL)
      {
!       int i;
!       for (i = 0; i < preg->re_nsub; i++)
! 	dfa->subexp_map[i] = i;
!       preorder (dfa->str_tree, optimize_subexps, dfa);
!       for (i = 0; i < preg->re_nsub; i++)
! 	if (dfa->subexp_map[i] != i)
! 	  break;
!       if (i == preg->re_nsub)
! 	{
! 	  free (dfa->subexp_map);
! 	  dfa->subexp_map = NULL;
! 	}
      }
+ 
+   ret = postorder (dfa->str_tree, lower_subexps, preg);
+   if (BE (ret != REG_NOERROR, 0))
+     return ret;
+   ret = postorder (dfa->str_tree, calc_first, dfa);
+   if (BE (ret != REG_NOERROR, 0))
+     return ret;
+   preorder (dfa->str_tree, calc_next, dfa);
+   ret = preorder (dfa->str_tree, link_nfa_nodes, dfa);
+   if (BE (ret != REG_NOERROR, 0))
+     return ret;
+   ret = calc_eclosure (dfa);
+   if (BE (ret != REG_NOERROR, 0))
+     return ret;
+   calc_inveclosure (dfa);
    return ret;
  }
  
! /* Our parse trees are very unbalanced, so we cannot use a stack to
!    implement parse tree visits.  Instead, we use parent pointers and
!    some hairy code in these two functions.  */
! static reg_errcode_t
! postorder (root, fn, extra)
!      bin_tree_t *root;
!      reg_errcode_t (fn (void *, bin_tree_t *));
!      void *extra;
! {
!   bin_tree_t *node, *prev;
! 
!   for (node = root; ; )
!     {
!       /* Descend down the tree, preferably to the left (or to the right
! 	 if that's the only child).  */
!       while (node->left || node->right)
! 	if (node->left)
!           node = node->left;
!         else
!           node = node->right;
! 
!       do
! 	{
! 	  reg_errcode_t err = fn (extra, node);
! 	  if (BE (err != REG_NOERROR, 0))
! 	    return err;
!           if (node->parent == NULL)
! 	    return REG_NOERROR;
! 	  prev = node;
! 	  node = node->parent;
! 	}
!       /* Go up while we have a node that is reached from the right.  */
!       while (node->right == prev || node->right == NULL);
!       node = node->right;
!     }
! }
! 
! static reg_errcode_t
! preorder (root, fn, extra)
!      bin_tree_t *root;
!      reg_errcode_t (fn (void *, bin_tree_t *));
!      void *extra;
! {
!   bin_tree_t *node;
! 
!   for (node = root; ; )
!     {
!       reg_errcode_t err = fn (extra, node);
!       if (BE (err != REG_NOERROR, 0))
! 	return err;
! 
!       /* Go to the left node, or up and to the right.  */
!       if (node->left)
! 	node = node->left;
!       else
! 	{
! 	  bin_tree_t *prev = NULL;
! 	  while (node->right == prev || node->right == NULL)
! 	    {
! 	      prev = node;
! 	      node = node->parent;
! 	      if (!node)
! 	        return REG_NOERROR;
! 	    }
! 	  node = node->right;
! 	}
!     }
! }
  
+ /* Optimization pass: if a SUBEXP is entirely contained, strip it and tell
+    re_search_internal to map the inner one's opr.idx to this one's.  Adjust
+    backreferences as well.  Requires a preorder visit.  */
  static reg_errcode_t
! optimize_subexps (extra, node)
!      void *extra;
       bin_tree_t *node;
  {
!   re_dfa_t *dfa = (re_dfa_t *) extra;
  
!   if (node->token.type == OP_BACK_REF && dfa->subexp_map)
      {
!       int idx = node->token.opr.idx;
!       node->token.opr.idx = dfa->subexp_map[idx];
!       dfa->used_bkref_map |= 1 << node->token.opr.idx;
      }
! 
!   else if (node->token.type == SUBEXP
!            && node->left && node->left->token.type == SUBEXP)
      {
!       int other_idx = node->left->token.opr.idx;
! 
!       node->left = node->left->left;
!       if (node->left)
!         node->left->parent = node;
! 
!       dfa->subexp_map[other_idx] = dfa->subexp_map[node->token.opr.idx];
!       if (other_idx < 8 * sizeof (dfa->used_bkref_map))
! 	dfa->used_bkref_map &= ~(1 << other_idx);
      }
+ 
    return REG_NOERROR;
  }
  
! /* Lowering pass: Turn each SUBEXP node into the appropriate concatenation
!    of OP_OPEN_SUBEXP, the body of the SUBEXP (if any) and OP_CLOSE_SUBEXP.  */
! static reg_errcode_t
! lower_subexps (extra, node)
!      void *extra;
       bin_tree_t *node;
  {
!   regex_t *preg = (regex_t *) extra;
!   reg_errcode_t err = REG_NOERROR;
  
!   if (node->left && node->left->token.type == SUBEXP)
      {
!       node->left = lower_subexp (&err, preg, node->left);
!       if (node->left)
! 	node->left->parent = node;
      }
+   if (node->right && node->right->token.type == SUBEXP)
+     {
+       node->right = lower_subexp (&err, preg, node->right);
+       if (node->right)
+ 	node->right->parent = node;
+     }
+ 
+   return err;
  }
  
! static bin_tree_t *
! lower_subexp (err, preg, node)
!      reg_errcode_t *err;
!      regex_t *preg;
!      bin_tree_t *node;
! {
!   re_dfa_t *dfa = (re_dfa_t *) preg->buffer;
!   bin_tree_t *body = node->left;
!   bin_tree_t *op, *cls, *tree1, *tree;
  
!   if (preg->no_sub
!       && (node->token.opr.idx >= 8 * sizeof (dfa->used_bkref_map)
! 	  || !(dfa->used_bkref_map & (1 << node->token.opr.idx))))
!     return node->left;
! 
!   /* Convert the SUBEXP node to the concatenation of an
!      OP_OPEN_SUBEXP, the contents, and an OP_CLOSE_SUBEXP.  */
!   op = create_tree (dfa, NULL, NULL, OP_OPEN_SUBEXP);
!   cls = create_tree (dfa, NULL, NULL, OP_CLOSE_SUBEXP);
!   tree1 = body ? create_tree (dfa, body, cls, CONCAT) : cls;
!   tree = create_tree (dfa, op, tree1, CONCAT);
!   if (BE (tree == NULL || tree1 == NULL || op == NULL || cls == NULL, 0))
!     {
!       *err = REG_ESPACE;
!       return NULL;
!     }
! 
!   op->token.opr.idx = cls->token.opr.idx = node->token.opr.idx;
!   op->token.opt_subexp = cls->token.opt_subexp = node->token.opt_subexp;
!   return tree;
! }
! 
! /* Pass 1 in building the NFA: compute FIRST and create unlinked automaton
!    nodes.  Requires a postorder visit.  */
! static reg_errcode_t
! calc_first (extra, node)
!      void *extra;
       bin_tree_t *node;
  {
!   re_dfa_t *dfa = (re_dfa_t *) extra;
!   if (node->token.type == CONCAT)
!     {
!       node->first = node->left->first;
!       node->node_idx = node->left->node_idx;
!     }
!   else
      {
!       node->first = node;
!       node->node_idx = re_dfa_add_node (dfa, node->token);
!       if (BE (node->node_idx == -1, 0))
!         return REG_ESPACE;
      }
+   return REG_NOERROR;
+ }
  
! /* Pass 2: compute NEXT on the tree.  Preorder visit.  */
! static reg_errcode_t
! calc_next (extra, node)
!      void *extra;
!      bin_tree_t *node;
! {
!   switch (node->token.type)
      {
      case OP_DUP_ASTERISK:
!       node->left->next = node;
        break;
      case CONCAT:
!       node->left->next = node->right->first;
!       node->right->next = node->next;
!       break;
      default:
!       if (node->left)
! 	node->left->next = node->next;
!       if (node->right)
!         node->right->next = node->next;
        break;
      }
!   return REG_NOERROR;
  }
  
! /* Pass 3: link all DFA nodes to their NEXT node (any order will do).  */
! static reg_errcode_t
! link_nfa_nodes (extra, node)
!      void *extra;
       bin_tree_t *node;
  {
!   re_dfa_t *dfa = (re_dfa_t *) extra;
!   int idx = node->node_idx;
!   reg_errcode_t err = REG_NOERROR;
! 
!   switch (node->token.type)
      {
!     case CONCAT:
!       break;
!       
!     case END_OF_RE:
!       assert (node->next == NULL);
!       break;
!       
!     case OP_DUP_ASTERISK:
!     case OP_ALT:
!       {
! 	int left, right;
! 	dfa->has_plural_match = 1;
! 	if (node->left != NULL)
! 	  left = node->left->first->node_idx;
! 	else
! 	  left = node->next->node_idx;
! 	if (node->right != NULL)
! 	  right = node->right->first->node_idx;
! 	else
! 	  right = node->next->node_idx;
! 	assert (left > -1);
! 	assert (right > -1);
! 	err = re_node_set_init_2 (dfa->edests + idx, left, right);
!       }
!       break;
!       
!     case ANCHOR:
!     case OP_OPEN_SUBEXP:
!     case OP_CLOSE_SUBEXP:
!       err = re_node_set_init_1 (dfa->edests + idx, node->next->node_idx);
!       break;
!       
!     case OP_BACK_REF:
!       dfa->nexts[idx] = node->next->node_idx;
!       if (node->token.type == OP_BACK_REF)
! 	re_node_set_init_1 (dfa->edests + idx, dfa->nexts[idx]);
!       break;
!       
!     default:
!       assert (!IS_EPSILON_NODE (node->token.type));
!       dfa->nexts[idx] = node->next->node_idx;
!       break;
      }
+   
+   return err;
  }
  
  /* Duplicate the epsilon closure of the node ROOT_NODE.
***************
*** 1520,1526 ****
        else /* dfa->edests[org_node].nelem == 2 */
  	{
  	  /* In case of the node can epsilon-transit, and it has two
! 	     destinations. E.g. '|', '*', '+', '?'.   */
  	  org_dest = dfa->edests[org_node].elems[0];
  	  re_node_set_empty (dfa->edests + clone_node);
  	  /* Search for a duplicated node which satisfies the constraint.  */
--- 1514,1520 ----
        else /* dfa->edests[org_node].nelem == 2 */
  	{
  	  /* In case of the node can epsilon-transit, and it has two
! 	     destinations. In the bin_tree_t and DFA, that's '|' and '*'.   */
  	  org_dest = dfa->edests[org_node].elems[0];
  	  re_node_set_empty (dfa->edests + clone_node);
  	  /* Search for a duplicated node which satisfies the constraint.  */
***************
*** 1591,1606 ****
       int *new_idx, org_idx;
       unsigned int constraint;
  {
!   int dup_idx = re_dfa_add_node (dfa, dfa->nodes[org_idx], 1);
    if (BE (dup_idx == -1, 0))
      return REG_ESPACE;
    dfa->nodes[dup_idx].constraint = constraint;
    if (dfa->nodes[org_idx].type == ANCHOR)
      dfa->nodes[dup_idx].constraint |= dfa->nodes[org_idx].opr.ctx_type;
    dfa->nodes[dup_idx].duplicated = 1;
-   re_node_set_init_empty (dfa->edests + dup_idx);
-   re_node_set_init_empty (dfa->eclosures + dup_idx);
-   re_node_set_init_empty (dfa->inveclosures + dup_idx);
  
    /* Store the index of the original node.  */
    dfa->org_indices[dup_idx] = org_idx;
--- 1585,1597 ----
       int *new_idx, org_idx;
       unsigned int constraint;
  {
!   int dup_idx = re_dfa_add_node (dfa, dfa->nodes[org_idx]);
    if (BE (dup_idx == -1, 0))
      return REG_ESPACE;
    dfa->nodes[dup_idx].constraint = constraint;
    if (dfa->nodes[org_idx].type == ANCHOR)
      dfa->nodes[dup_idx].constraint |= dfa->nodes[org_idx].opr.ctx_type;
    dfa->nodes[dup_idx].duplicated = 1;
  
    /* Store the index of the original node.  */
    dfa->org_indices[dup_idx] = org_idx;
***************
*** 1615,1622 ****
    int src, idx, dest;
    for (src = 0; src < dfa->nodes_len; ++src)
      {
-       if (dfa->nodes[src].type == OP_DELETED_SUBEXP)
-         continue;
        for (idx = 0; idx < dfa->eclosures[src].nelem; ++idx)
  	{
  	  dest = dfa->eclosures[src].elems[idx];
--- 1606,1611 ----
***************
*** 1652,1659 ****
  #ifdef DEBUG
        assert (dfa->eclosures[node_idx].nelem != -1);
  #endif
-       if (dfa->nodes[node_idx].type == OP_DELETED_SUBEXP)
-         continue;
  
        /* If we have already calculated, skip it.  */
        if (dfa->eclosures[node_idx].nelem != 0)
--- 1641,1646 ----
***************
*** 2124,2132 ****
    tree = parse_reg_exp (regexp, preg, &current_token, syntax, 0, err);
    if (BE (*err != REG_NOERROR && tree == NULL, 0))
      return NULL;
!   eor = re_dfa_add_tree_node (dfa, NULL, NULL, &current_token);
    if (tree != NULL)
!     root = create_tree (dfa, tree, eor, CONCAT, 0);
    else
      root = eor;
    if (BE (eor == NULL || root == NULL, 0))
--- 2111,2119 ----
    tree = parse_reg_exp (regexp, preg, &current_token, syntax, 0, err);
    if (BE (*err != REG_NOERROR && tree == NULL, 0))
      return NULL;
!   eor = create_tree (dfa, NULL, NULL, END_OF_RE);
    if (tree != NULL)
!     root = create_tree (dfa, tree, eor, CONCAT);
    else
      root = eor;
    if (BE (eor == NULL || root == NULL, 0))
***************
*** 2163,2169 ****
  
    while (token->type == OP_ALT)
      {
-       re_token_t alt_token = *token;
        fetch_token (token, regexp, syntax | RE_CARET_ANCHORS_HERE);
        if (token->type != OP_ALT && token->type != END_OF_RE
  	  && (nest == 0 || token->type != OP_CLOSE_SUBEXP))
--- 2150,2155 ----
***************
*** 2174,2186 ****
  	}
        else
  	branch = NULL;
!       tree = re_dfa_add_tree_node (dfa, tree, branch, &alt_token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
  	  return NULL;
  	}
-       dfa->has_plural_match = 1;
      }
    return tree;
  }
--- 2160,2171 ----
  	}
        else
  	branch = NULL;
!       tree = create_tree (dfa, tree, branch, OP_ALT);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
  	  return NULL;
  	}
      }
    return tree;
  }
***************
*** 2219,2225 ****
  	}
        if (tree != NULL && exp != NULL)
  	{
! 	  tree = create_tree (dfa, tree, exp, CONCAT, 0);
  	  if (tree == NULL)
  	    {
  	      *err = REG_ESPACE;
--- 2204,2210 ----
  	}
        if (tree != NULL && exp != NULL)
  	{
! 	  tree = create_tree (dfa, tree, exp, CONCAT);
  	  if (tree == NULL)
  	    {
  	      *err = REG_ESPACE;
***************
*** 2253,2259 ****
    switch (token->type)
      {
      case CHARACTER:
!       tree = re_dfa_add_tree_node (dfa, NULL, NULL, token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
--- 2238,2244 ----
    switch (token->type)
      {
      case CHARACTER:
!       tree = create_token_tree (dfa, NULL, NULL, token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
***************
*** 2267,2274 ****
  	    {
  	      bin_tree_t *mbc_remain;
  	      fetch_token (token, regexp, syntax);
! 	      mbc_remain = re_dfa_add_tree_node (dfa, NULL, NULL, token);
! 	      tree = create_tree (dfa, tree, mbc_remain, CONCAT, 0);
  	      if (BE (mbc_remain == NULL || tree == NULL, 0))
  		{
  		  *err = REG_ESPACE;
--- 2252,2259 ----
  	    {
  	      bin_tree_t *mbc_remain;
  	      fetch_token (token, regexp, syntax);
! 	      mbc_remain = create_token_tree (dfa, NULL, NULL, token);
! 	      tree = create_tree (dfa, tree, mbc_remain, CONCAT);
  	      if (BE (mbc_remain == NULL || tree == NULL, 0))
  		{
  		  *err = REG_ESPACE;
***************
*** 2295,2301 ****
  	  return NULL;
  	}
        dfa->used_bkref_map |= 1 << token->opr.idx;
!       tree = re_dfa_add_tree_node (dfa, NULL, NULL, token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
--- 2280,2286 ----
  	  return NULL;
  	}
        dfa->used_bkref_map |= 1 << token->opr.idx;
!       tree = create_token_tree (dfa, NULL, NULL, token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
***************
*** 2340,2346 ****
        token->type = CHARACTER;
        /* mb_partial and word_char bits should be initialized already
  	 by peek_token.  */
!       tree = re_dfa_add_tree_node (dfa, NULL, NULL, token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
--- 2325,2331 ----
        token->type = CHARACTER;
        /* mb_partial and word_char bits should be initialized already
  	 by peek_token.  */
!       tree = create_token_tree (dfa, NULL, NULL, token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
***************
*** 2356,2366 ****
  	{
  	  bin_tree_t *tree_first, *tree_last;
  	  token->opr.ctx_type = WORD_FIRST;
! 	  tree_first = re_dfa_add_tree_node (dfa, NULL, NULL, token);
  	  token->opr.ctx_type = WORD_LAST;
! 	  tree_last = re_dfa_add_tree_node (dfa, NULL, NULL, token);
! 	  token->type = OP_ALT;
! 	  tree = re_dfa_add_tree_node (dfa, tree_first, tree_last, token);
  	  if (BE (tree_first == NULL || tree_last == NULL || tree == NULL, 0))
  	    {
  	      *err = REG_ESPACE;
--- 2341,2350 ----
  	{
  	  bin_tree_t *tree_first, *tree_last;
  	  token->opr.ctx_type = WORD_FIRST;
! 	  tree_first = create_token_tree (dfa, NULL, NULL, token);
  	  token->opr.ctx_type = WORD_LAST;
! 	  tree_last = create_token_tree (dfa, NULL, NULL, token);
! 	  tree = create_tree (dfa, tree_first, tree_last, OP_ALT);
  	  if (BE (tree_first == NULL || tree_last == NULL || tree == NULL, 0))
  	    {
  	      *err = REG_ESPACE;
***************
*** 2369,2375 ****
  	}
        else
  	{
! 	  tree = re_dfa_add_tree_node (dfa, NULL, NULL, token);
  	  if (BE (tree == NULL, 0))
  	    {
  	      *err = REG_ESPACE;
--- 2353,2359 ----
  	}
        else
  	{
! 	  tree = create_token_tree (dfa, NULL, NULL, token);
  	  if (BE (tree == NULL, 0))
  	    {
  	      *err = REG_ESPACE;
***************
*** 2383,2389 ****
        fetch_token (token, regexp, syntax);
        return tree;
      case OP_PERIOD:
!       tree = re_dfa_add_tree_node (dfa, NULL, NULL, token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
--- 2367,2373 ----
        fetch_token (token, regexp, syntax);
        return tree;
      case OP_PERIOD:
!       tree = create_token_tree (dfa, NULL, NULL, token);
        if (BE (tree == NULL, 0))
  	{
  	  *err = REG_ESPACE;
***************
*** 2439,2445 ****
  	  *err = REG_BADRPT;
  	  return NULL;
  	}
-       dfa->has_plural_match = 1;
      }
  
    return tree;
--- 2423,2428 ----
***************
*** 2462,2478 ****
       reg_errcode_t *err;
  {
    re_dfa_t *dfa = (re_dfa_t *) preg->buffer;
!   bin_tree_t *tree, *left_par, *right_par;
    size_t cur_nsub;
    cur_nsub = preg->re_nsub++;
  
-   left_par = re_dfa_add_tree_node (dfa, NULL, NULL, token);
-   if (BE (left_par == NULL, 0))
-     {
-       *err = REG_ESPACE;
-       return NULL;
-     }
-   dfa->nodes[left_par->node_idx].opr.idx = cur_nsub;
    fetch_token (token, regexp, syntax | RE_CARET_ANCHORS_HERE);
  
    /* The subexpression may be a null string.  */
--- 2445,2454 ----
       reg_errcode_t *err;
  {
    re_dfa_t *dfa = (re_dfa_t *) preg->buffer;
!   bin_tree_t *tree;
    size_t cur_nsub;
    cur_nsub = preg->re_nsub++;
  
    fetch_token (token, regexp, syntax | RE_CARET_ANCHORS_HERE);
  
    /* The subexpression may be a null string.  */
***************
*** 2481,2506 ****
    else
      {
        tree = parse_reg_exp (regexp, preg, token, syntax, nest, err);
!       if (BE (*err != REG_NOERROR && tree == NULL, 0))
  	return NULL;
      }
-   if (BE (token->type != OP_CLOSE_SUBEXP, 0))
-     {
-       *err = REG_EPAREN;
-       return NULL;
-     }
-   right_par = re_dfa_add_tree_node (dfa, NULL, NULL, token);
    dfa->completed_bkref_map |= 1 << cur_nsub;
!   tree = ((tree == NULL) ? right_par
! 	  : create_tree (dfa, tree, right_par, CONCAT, 0));
!   tree = create_tree (dfa, left_par, tree, CONCAT, 0);
!   if (BE (right_par == NULL || tree == NULL, 0))
      {
        *err = REG_ESPACE;
        return NULL;
      }
!   dfa->nodes[right_par->node_idx].opr.idx = cur_nsub;
! 
    return tree;
  }
  
--- 2457,2476 ----
    else
      {
        tree = parse_reg_exp (regexp, preg, token, syntax, nest, err);
!       if (BE (*err == REG_NOERROR && token->type != OP_CLOSE_SUBEXP, 0))
!         *err = REG_EPAREN;
!       if (BE (*err != REG_NOERROR, 0))
  	return NULL;
      }
    dfa->completed_bkref_map |= 1 << cur_nsub;
! 
!   tree = create_tree (dfa, tree, NULL, SUBEXP);
!   if (BE (tree == NULL, 0))
      {
        *err = REG_ESPACE;
        return NULL;
      }
!   tree->token.opr.idx = cur_nsub;
    return tree;
  }
  
***************
*** 2515,2521 ****
       reg_syntax_t syntax;
       reg_errcode_t *err;
  {
-   re_token_t dup_token;
    bin_tree_t *tree = NULL, *old_tree = NULL;
    int i, start, end, start_idx = re_string_cur_idx (regexp);
    re_token_t start_token = *token;
--- 2485,2490 ----
***************
*** 2578,2586 ****
  
    fetch_token (token, regexp, syntax);
  
!   /* Treat "<re>{0}*" etc. as "<re>{0}".  */
!   if (BE (elem == NULL || (start == 0 && end == 0), 0))
      return NULL;
  
    /* Extract "<re>{n,m}" to "<re><re>...<re><re>{0,<m-n>}".  */
    if (BE (start > 0, 0))
--- 2547,2559 ----
  
    fetch_token (token, regexp, syntax);
  
!   if (BE (elem == NULL, 0))
      return NULL;
+   if (BE (start == 0 && end == 0, 0))
+     {
+       postorder (elem, free_tree, NULL);
+       return NULL;
+     }
  
    /* Extract "<re>{n,m}" to "<re><re>...<re><re>{0,<m-n>}".  */
    if (BE (start > 0, 0))
***************
*** 2589,2595 ****
        for (i = 2; i <= start; ++i)
  	{
  	  elem = duplicate_tree (elem, dfa);
! 	  tree = create_tree (dfa, tree, elem, CONCAT, 0);
  	  if (BE (elem == NULL || tree == NULL, 0))
  	    goto parse_dup_op_espace;
  	}
--- 2562,2568 ----
        for (i = 2; i <= start; ++i)
  	{
  	  elem = duplicate_tree (elem, dfa);
! 	  tree = create_tree (dfa, tree, elem, CONCAT);
  	  if (BE (elem == NULL || tree == NULL, 0))
  	    goto parse_dup_op_espace;
  	}
***************
*** 2604,2612 ****
    else
      old_tree = NULL;
  
!   mark_opt_subexp (elem, dfa);
!   dup_token.type = (end == -1 ? OP_DUP_ASTERISK : OP_DUP_QUESTION);
!   tree = re_dfa_add_tree_node (dfa, elem, NULL, &dup_token);
    if (BE (tree == NULL, 0))
      goto parse_dup_op_espace;
  
--- 2577,2586 ----
    else
      old_tree = NULL;
  
!   if (elem->token.type == SUBEXP)
!     postorder (elem, mark_opt_subexp, (void *) (long) elem->token.opr.idx);
! 
!   tree = create_tree (dfa, elem, NULL, (end == -1 ? OP_DUP_ASTERISK : OP_ALT));
    if (BE (tree == NULL, 0))
      goto parse_dup_op_espace;
  
***************
*** 2616,2632 ****
    for (i = start + 2; i <= end; ++i)
      {
        elem = duplicate_tree (elem, dfa);
!       tree = create_tree (dfa, tree, elem, CONCAT, 0);
        if (BE (elem == NULL || tree == NULL, 0))
          goto parse_dup_op_espace;
  
!       tree = re_dfa_add_tree_node (dfa, tree, NULL, &dup_token);
        if (BE (tree == NULL, 0))
          goto parse_dup_op_espace;
      }
  
    if (old_tree)
!     tree = create_tree (dfa, old_tree, tree, CONCAT, 0);
  
    return tree;
  
--- 2590,2606 ----
    for (i = start + 2; i <= end; ++i)
      {
        elem = duplicate_tree (elem, dfa);
!       tree = create_tree (dfa, tree, elem, CONCAT);
        if (BE (elem == NULL || tree == NULL, 0))
          goto parse_dup_op_espace;
  
!       tree = create_tree (dfa, tree, NULL, OP_ALT);
        if (BE (tree == NULL, 0))
          goto parse_dup_op_espace;
      }
  
    if (old_tree)
!     tree = create_tree (dfa, old_tree, tree, CONCAT);
  
    return tree;
  
***************
*** 3293,3299 ****
        dfa->has_mb_node = 1;
        br_token.type = COMPLEX_BRACKET;
        br_token.opr.mbcset = mbcset;
!       mbc_tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token);
        if (BE (mbc_tree == NULL, 0))
  	goto parse_bracket_exp_espace;
        for (sbc_idx = 0; sbc_idx < BITSET_UINTS; ++sbc_idx)
--- 3267,3273 ----
        dfa->has_mb_node = 1;
        br_token.type = COMPLEX_BRACKET;
        br_token.opr.mbcset = mbcset;
!       mbc_tree = create_token_tree (dfa, NULL, NULL, &br_token);
        if (BE (mbc_tree == NULL, 0))
  	goto parse_bracket_exp_espace;
        for (sbc_idx = 0; sbc_idx < BITSET_UINTS; ++sbc_idx)
***************
*** 3303,3320 ****
  	 of having both SIMPLE_BRACKET and COMPLEX_BRACKET.  */
        if (sbc_idx < BITSET_UINTS)
  	{
-           re_token_t alt_token;
            /* Build a tree for simple bracket.  */
            br_token.type = SIMPLE_BRACKET;
            br_token.opr.sbcset = sbcset;
!           work_tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token);
            if (BE (work_tree == NULL, 0))
              goto parse_bracket_exp_espace;
  
            /* Then join them by ALT node.  */
!           alt_token.type = OP_ALT;
!           dfa->has_plural_match = 1;
!           work_tree = re_dfa_add_tree_node (dfa, work_tree, mbc_tree, &alt_token);
            if (BE (work_tree == NULL, 0))
              goto parse_bracket_exp_espace;
  	}
--- 3277,3291 ----
  	 of having both SIMPLE_BRACKET and COMPLEX_BRACKET.  */
        if (sbc_idx < BITSET_UINTS)
  	{
            /* Build a tree for simple bracket.  */
            br_token.type = SIMPLE_BRACKET;
            br_token.opr.sbcset = sbcset;
!           work_tree = create_token_tree (dfa, NULL, NULL, &br_token);
            if (BE (work_tree == NULL, 0))
              goto parse_bracket_exp_espace;
  
            /* Then join them by ALT node.  */
!           work_tree = create_tree (dfa, work_tree, mbc_tree, OP_ALT);
            if (BE (work_tree == NULL, 0))
              goto parse_bracket_exp_espace;
  	}
***************
*** 3329,3335 ****
        /* Build a tree for simple bracket.  */
        br_token.type = SIMPLE_BRACKET;
        br_token.opr.sbcset = sbcset;
!       work_tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token);
        if (BE (work_tree == NULL, 0))
          goto parse_bracket_exp_espace;
  
--- 3300,3306 ----
        /* Build a tree for simple bracket.  */
        br_token.type = SIMPLE_BRACKET;
        br_token.opr.sbcset = sbcset;
!       work_tree = create_token_tree (dfa, NULL, NULL, &br_token);
        if (BE (work_tree == NULL, 0))
          goto parse_bracket_exp_espace;
  
***************
*** 3697,3722 ****
    /* Build a tree for simple bracket.  */
    br_token.type = SIMPLE_BRACKET;
    br_token.opr.sbcset = sbcset;
!   tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token);
    if (BE (tree == NULL, 0))
      goto build_word_op_espace;
  
  #ifdef RE_ENABLE_I18N
    if (dfa->mb_cur_max > 1)
      {
-       re_token_t alt_token;
        bin_tree_t *mbc_tree;
        /* Build a tree for complex bracket.  */
        br_token.type = COMPLEX_BRACKET;
        br_token.opr.mbcset = mbcset;
        dfa->has_mb_node = 1;
!       mbc_tree = re_dfa_add_tree_node (dfa, NULL, NULL, &br_token);
        if (BE (mbc_tree == NULL, 0))
  	goto build_word_op_espace;
        /* Then join them by ALT node.  */
!       alt_token.type = OP_ALT;
!       dfa->has_plural_match = 1;
!       tree = re_dfa_add_tree_node (dfa, tree, mbc_tree, &alt_token);
        if (BE (mbc_tree != NULL, 1))
  	return tree;
      }
--- 3668,3690 ----
    /* Build a tree for simple bracket.  */
    br_token.type = SIMPLE_BRACKET;
    br_token.opr.sbcset = sbcset;
!   tree = create_token_tree (dfa, NULL, NULL, &br_token);
    if (BE (tree == NULL, 0))
      goto build_word_op_espace;
  
  #ifdef RE_ENABLE_I18N
    if (dfa->mb_cur_max > 1)
      {
        bin_tree_t *mbc_tree;
        /* Build a tree for complex bracket.  */
        br_token.type = COMPLEX_BRACKET;
        br_token.opr.mbcset = mbcset;
        dfa->has_mb_node = 1;
!       mbc_tree = create_token_tree (dfa, NULL, NULL, &br_token);
        if (BE (mbc_tree == NULL, 0))
  	goto build_word_op_espace;
        /* Then join them by ALT node.  */
!       tree = create_tree (dfa, tree, mbc_tree, OP_ALT);
        if (BE (mbc_tree != NULL, 1))
  	return tree;
      }
***************
*** 3787,3798 ****
  /* Create a tree node.  */
  
  static bin_tree_t *
! create_tree (dfa, left, right, type, index)
       re_dfa_t *dfa;
       bin_tree_t *left;
       bin_tree_t *right;
       re_token_type_t type;
!      int index;
  {
    bin_tree_t *tree;
    if (BE (dfa->str_tree_storage_idx == BIN_TREE_STORAGE_SIZE, 0))
--- 3755,3777 ----
  /* Create a tree node.  */
  
  static bin_tree_t *
! create_tree (dfa, left, right, type)
       re_dfa_t *dfa;
       bin_tree_t *left;
       bin_tree_t *right;
       re_token_type_t type;
! {
!   re_token_t t;
!   t.type = type;
!   return create_token_tree (dfa, left, right, &t);
! }
! 
! static bin_tree_t *
! create_token_tree (dfa, left, right, token)
!      re_dfa_t *dfa;
!      bin_tree_t *left;
!      bin_tree_t *right;
!      const re_token_t *token;
  {
    bin_tree_t *tree;
    if (BE (dfa->str_tree_storage_idx == BIN_TREE_STORAGE_SIZE, 0))
***************
*** 3810,3820 ****
    tree->parent = NULL;
    tree->left = left;
    tree->right = right;
!   tree->type = type;
!   tree->node_idx = index;
!   tree->first = -1;
!   tree->next = -1;
!   re_node_set_init_empty (&tree->eclosure);
  
    if (left != NULL)
      left->parent = tree;
--- 3789,3800 ----
    tree->parent = NULL;
    tree->left = left;
    tree->right = right;
!   tree->token = *token;
!   tree->token.duplicated = 0;
!   tree->token.opt_subexp = 0;
!   tree->first = NULL;
!   tree->next = NULL;
!   tree->node_idx = -1;
  
    if (left != NULL)
      left->parent = tree;
***************
*** 3823,3925 ****
    return tree;
  }
  
! /* Create both a DFA node and a tree for it.  */
  
! static bin_tree_t *
! re_dfa_add_tree_node (dfa, left, right, token)
!      re_dfa_t *dfa;
!      bin_tree_t *left;
!      bin_tree_t *right;
!      const re_token_t *token;
  {
!   int new_idx = re_dfa_add_node (dfa, *token, 0);
  
!   if (new_idx == -1)
!     return NULL;
! 
!   return create_tree (dfa, left, right, 0, new_idx);
  }
  
! /* Mark the tree SRC as an optional subexpression.  */
  
  static void
! mark_opt_subexp (src, dfa)
!      const bin_tree_t *src;
!      re_dfa_t *dfa;
  {
!   /* Pass an OPT_SUBEXP_IDX which is != 1 if the duplicated tree is
!      a subexpression.  */
!   if (src->type == CONCAT
!       && src->left->type == NON_TYPE
!       && dfa->nodes[src->left->node_idx].type == OP_OPEN_SUBEXP)
!     mark_opt_subexp_iter (src, dfa, dfa->nodes[src->left->node_idx].opr.idx);
  }
  
  
! /* Recursive tree walker for mark_opt_subexp.  */
! 
! static void
! mark_opt_subexp_iter (src, dfa, idx)
!      const bin_tree_t *src;
!      re_dfa_t *dfa;
!      int idx;
  {
!   int node_idx;
! 
!   if (src->type == NON_TYPE)
!     {
!       node_idx = src->node_idx;
!       if ((dfa->nodes[node_idx].type == OP_OPEN_SUBEXP
! 	   || dfa->nodes[node_idx].type == OP_CLOSE_SUBEXP)
! 	  && dfa->nodes[node_idx].opr.idx == idx)
! 	dfa->nodes[node_idx].opt_subexp = 1;
!      }
! 
!   if (src->left != NULL)
!     mark_opt_subexp_iter (src->left, dfa, idx);
! 
!   if (src->right != NULL)
!     mark_opt_subexp_iter (src->right, dfa, idx);
  }
  
  
! /* Duplicate the node SRC, and return new node.  */
  
  static bin_tree_t *
! duplicate_tree (src, dfa)
!      const bin_tree_t *src;
       re_dfa_t *dfa;
  {
!   bin_tree_t *left = NULL, *right = NULL, *new_tree;
!   int new_node_idx;
!   /* Since node indies must be according to Post-order of the tree,
!      we must duplicate the left at first.  */
!   if (src->left != NULL)
!     {
!       left = duplicate_tree (src->left, dfa);
!       if (left == NULL)
! 	return NULL;
!     }
  
!   /* Secondaly, duplicate the right.  */
!   if (src->right != NULL)
      {
!       right = duplicate_tree (src->right, dfa);
!       if (right == NULL)
  	return NULL;
!     }
  
!   /* At last, duplicate itself.  */
!   if (src->type == NON_TYPE)
!     {
!       new_node_idx = re_dfa_add_node (dfa, dfa->nodes[src->node_idx], 0);
!       dfa->nodes[new_node_idx].duplicated = 1;
!       if (BE (new_node_idx == -1, 0))
! 	return NULL;
      }
-   else
-     new_node_idx = src->type;
- 
-   new_tree = create_tree (dfa, left, right, src->type, new_node_idx);
-   return new_tree;
  }
--- 3803,3891 ----
    return tree;
  }
  
! /* Mark the tree SRC as an optional subexpression.
!    To be called from preorder or postorder.  */
  
! static reg_errcode_t
! mark_opt_subexp (extra, node)
!      void *extra;
!      bin_tree_t *node;
  {
!   int idx = (int) (long) extra;
!   if (node->token.type == SUBEXP && node->token.opr.idx == idx)
!     node->token.opt_subexp = 1;
  
!   return REG_NOERROR;
  }
  
! /* Free the allocated memory inside NODE. */
  
  static void
! free_token (re_token_t *node)
  {
! #ifdef RE_ENABLE_I18N
!   if (node->type == COMPLEX_BRACKET && node->duplicated == 0)
!     free_charset (node->opr.mbcset);
!   else
! #endif /* RE_ENABLE_I18N */
!     if (node->type == SIMPLE_BRACKET && node->duplicated == 0)
!       re_free (node->opr.sbcset);
  }
  
+ /* Worker function for tree walking.  Free the allocated memory inside NODE
+    and its children. */
  
! static reg_errcode_t
! free_tree (void *extra, bin_tree_t *node)
  {
!   free_token (&node->token);
!   return REG_NOERROR;
  }
  
  
! /* Duplicate the node SRC, and return new node.  This is a preorder
!    visit similar to the one implemented by the generic visitor, but
!    we need more infrastructure to maintain two parallel trees --- so,
!    it's easier to duplicate.  */
  
  static bin_tree_t *
! duplicate_tree (root, dfa)
!      const bin_tree_t *root;
       re_dfa_t *dfa;
  {
!   const bin_tree_t *node;
!   bin_tree_t *dup_root;
!   bin_tree_t **p_new = &dup_root, *dup_node = root->parent;
  
!   for (node = root; ; )
      {
!       /* Create a new tree and link it back to the current parent.  */
!       *p_new = create_token_tree (dfa, NULL, NULL, &node->token);
!       if (*p_new == NULL)
  	return NULL;
!       (*p_new)->parent = dup_node;
!       (*p_new)->token.duplicated = 1;
!       dup_node = *p_new;
  
!       /* Go to the left node, or up and to the right.  */
!       if (node->left)
! 	{
! 	  node = node->left;
! 	  p_new = &dup_node->left;
! 	}
!       else
! 	{
! 	  const bin_tree_t *prev = NULL;
! 	  while (node->right == prev || node->right == NULL)
! 	    {
! 	      prev = node;
! 	      node = node->parent;
! 	      dup_node = dup_node->parent;
! 	      if (!node)
! 	        return dup_root;
! 	    }
! 	  node = node->right;
! 	  p_new = &dup_node->right;
! 	}
      }
  }


*** orig/lib/regex.c
--- mod/lib/regex.c
***************
*** 80,85 ****
--- 80,87 ----
     #undefs RE_DUP_MAX and sets it to the right value.  */
  #include <limits.h>
  
+ #define DEBUG 1
+ 
  #include <regex.h>
  #include "regex_internal.h"
  


*** orig/lib/regex_internal.c
--- mod/lib/regex_internal.c
***************
*** 1330,1381 ****
     Or return -1, if an error will be occured.  */
  
  static int
! re_dfa_add_node (dfa, token, mode)
       re_dfa_t *dfa;
       re_token_t token;
-      int mode;
  {
    int type = token.type;
    if (BE (dfa->nodes_len >= dfa->nodes_alloc, 0))
      {
        int new_nodes_alloc = dfa->nodes_alloc * 2;
        re_token_t *new_array = re_realloc (dfa->nodes, re_token_t,
  					  new_nodes_alloc);
        if (BE (new_array == NULL, 0))
  	return -1;
        dfa->nodes = new_array;
!       if (mode)
! 	{
! 	  int *new_nexts, *new_indices;
! 	  re_node_set *new_edests, *new_eclosures, *new_inveclosures;
! 
! 	  new_nexts = re_realloc (dfa->nexts, int, new_nodes_alloc);
! 	  new_indices = re_realloc (dfa->org_indices, int, new_nodes_alloc);
! 	  new_edests = re_realloc (dfa->edests, re_node_set, new_nodes_alloc);
! 	  new_eclosures = re_realloc (dfa->eclosures, re_node_set,
! 				      new_nodes_alloc);
! 	  new_inveclosures = re_realloc (dfa->inveclosures, re_node_set,
! 					 new_nodes_alloc);
! 	  if (BE (new_nexts == NULL || new_indices == NULL
! 		  || new_edests == NULL || new_eclosures == NULL
! 		  || new_inveclosures == NULL, 0))
! 	    return -1;
! 	  dfa->nexts = new_nexts;
! 	  dfa->org_indices = new_indices;
! 	  dfa->edests = new_edests;
! 	  dfa->eclosures = new_eclosures;
! 	  dfa->inveclosures = new_inveclosures;
! 	}
        dfa->nodes_alloc = new_nodes_alloc;
      }
    dfa->nodes[dfa->nodes_len] = token;
-   dfa->nodes[dfa->nodes_len].opt_subexp = 0;
-   dfa->nodes[dfa->nodes_len].duplicated = 0;
    dfa->nodes[dfa->nodes_len].constraint = 0;
  #ifdef RE_ENABLE_I18N
    dfa->nodes[dfa->nodes_len].accept_mb =
      (type == OP_PERIOD && dfa->mb_cur_max > 1) || type == COMPLEX_BRACKET;
  #endif
    return dfa->nodes_len++;
  }
  
--- 1330,1378 ----
     Or return -1, if an error will be occured.  */
  
  static int
! re_dfa_add_node (dfa, token)
       re_dfa_t *dfa;
       re_token_t token;
  {
    int type = token.type;
    if (BE (dfa->nodes_len >= dfa->nodes_alloc, 0))
      {
        int new_nodes_alloc = dfa->nodes_alloc * 2;
+       int *new_nexts, *new_indices;
+       re_node_set *new_edests, *new_eclosures, *new_inveclosures;
+ 
        re_token_t *new_array = re_realloc (dfa->nodes, re_token_t,
  					  new_nodes_alloc);
        if (BE (new_array == NULL, 0))
  	return -1;
        dfa->nodes = new_array;
!       new_nexts = re_realloc (dfa->nexts, int, new_nodes_alloc);
!       new_indices = re_realloc (dfa->org_indices, int, new_nodes_alloc);
!       new_edests = re_realloc (dfa->edests, re_node_set, new_nodes_alloc);
!       new_eclosures = re_realloc (dfa->eclosures, re_node_set, new_nodes_alloc);
!       new_inveclosures = re_realloc (dfa->inveclosures, re_node_set,
! 				     new_nodes_alloc);
!       if (BE (new_nexts == NULL || new_indices == NULL
! 	      || new_edests == NULL || new_eclosures == NULL
! 	      || new_inveclosures == NULL, 0))
! 	return -1;
!       dfa->nexts = new_nexts;
!       dfa->org_indices = new_indices;
!       dfa->edests = new_edests;
!       dfa->eclosures = new_eclosures;
!       dfa->inveclosures = new_inveclosures;
        dfa->nodes_alloc = new_nodes_alloc;
      }
    dfa->nodes[dfa->nodes_len] = token;
    dfa->nodes[dfa->nodes_len].constraint = 0;
  #ifdef RE_ENABLE_I18N
    dfa->nodes[dfa->nodes_len].accept_mb =
      (type == OP_PERIOD && dfa->mb_cur_max > 1) || type == COMPLEX_BRACKET;
  #endif
+   dfa->nexts[dfa->nodes_len] = -1;
+   re_node_set_init_empty (dfa->edests + dfa->nodes_len);
+   re_node_set_init_empty (dfa->eclosures + dfa->nodes_len);
+   re_node_set_init_empty (dfa->inveclosures + dfa->nodes_len);
    return dfa->nodes_len++;
  }
  


*** orig/lib/regex_internal.h
--- mod/lib/regex_internal.h
***************
*** 186,201 ****
    OP_CLOSE_SUBEXP = EPSILON_BIT | 1,
    OP_ALT = EPSILON_BIT | 2,
    OP_DUP_ASTERISK = EPSILON_BIT | 3,
!   OP_DUP_PLUS = EPSILON_BIT | 4,
!   OP_DUP_QUESTION = EPSILON_BIT | 5,
!   ANCHOR = EPSILON_BIT | 6,
!   OP_DELETED_SUBEXP = EPSILON_BIT | 7,
  
    /* Tree type, these are used only by tree. */
    CONCAT = 16,
  
    /* Token type, these are used only by token.  */
!   OP_OPEN_BRACKET = 17,
    OP_CLOSE_BRACKET,
    OP_CHARSET_RANGE,
    OP_OPEN_DUP_NUM,
--- 186,201 ----
    OP_CLOSE_SUBEXP = EPSILON_BIT | 1,
    OP_ALT = EPSILON_BIT | 2,
    OP_DUP_ASTERISK = EPSILON_BIT | 3,
!   ANCHOR = EPSILON_BIT | 4,
  
    /* Tree type, these are used only by tree. */
    CONCAT = 16,
+   SUBEXP = 17,
  
    /* Token type, these are used only by token.  */
!   OP_DUP_PLUS = 18,
!   OP_DUP_QUESTION,
!   OP_OPEN_BRACKET,
    OP_CLOSE_BRACKET,
    OP_CHARSET_RANGE,
    OP_OPEN_DUP_NUM,
***************
*** 428,442 ****
    struct bin_tree_t *parent;
    struct bin_tree_t *left;
    struct bin_tree_t *right;
  
    /* `node_idx' is the index in dfa->nodes, if `type' == 0.
       Otherwise `type' indicate the type of this node.  */
-   re_token_type_t type;
    int node_idx;
- 
-   int first;
-   int next;
-   re_node_set eclosure;
  };
  typedef struct bin_tree_t bin_tree_t;
  
--- 428,441 ----
    struct bin_tree_t *parent;
    struct bin_tree_t *left;
    struct bin_tree_t *right;
+   struct bin_tree_t *first;
+   struct bin_tree_t *next;
+ 
+   re_token_t token;
  
    /* `node_idx' is the index in dfa->nodes, if `type' == 0.
       Otherwise `type' indicate the type of this node.  */
    int node_idx;
  };
  typedef struct bin_tree_t bin_tree_t;
  
***************
*** 676,682 ****
    (re_node_set_remove_at (set, re_node_set_contains (set, id) - 1))
  #define re_node_set_empty(p) ((p)->nelem = 0)
  #define re_node_set_free(set) re_free ((set)->elems)
! static int re_dfa_add_node (re_dfa_t *dfa, re_token_t token, int mode) internal_function;
  static re_dfastate_t *re_acquire_state (reg_errcode_t *err, re_dfa_t *dfa,
  					const re_node_set *nodes) internal_function;
  static re_dfastate_t *re_acquire_state_context (reg_errcode_t *err,
--- 675,681 ----
    (re_node_set_remove_at (set, re_node_set_contains (set, id) - 1))
  #define re_node_set_empty(p) ((p)->nelem = 0)
  #define re_node_set_free(set) re_free ((set)->elems)
! static int re_dfa_add_node (re_dfa_t *dfa, re_token_t token) internal_function;
  static re_dfastate_t *re_acquire_state (reg_errcode_t *err, re_dfa_t *dfa,
  					const re_node_set *nodes) internal_function;
  static re_dfastate_t *re_acquire_state_context (reg_errcode_t *err,




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]