This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] posix: Sync gnulib regex implementation

From: Paul Eggert <eggert at cs dot ucla dot edu>
To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
Date: Thu, 28 Jun 2018 13:21:50 -0700
Subject: Re: [PATCH] posix: Sync gnulib regex implementation
References: <1530204275-31537-1-git-send-email-adhemerval.zanella@linaro.org>

Adhemerval Zanella wrote:

This patch syncs regex code with gnulib (commit 16aa5a2).

Thanks for doing all that work. I merged some of the changes embodied in thatpatch back to Gnulib; see<https://lists.gnu.org/r/bug-gnulib/2018-06/msg00095.html>. Here are somecomments on the changes that I had questions about:

   * regcomp: add a __libc_lock_init init for dfa lock.

Why is this needed? The lock has already been initialized by the call tolock_init about 15 lines earlier. Please see attached patch, which reverts thischange.

   * regexec: instead of use intprops.h and its macros, this patch
     the check_add_overflow_idx function to check for addition
     overflow.  The function need to be defined on the file because
     the file will be build multiple times wit 'Idx' type being
     redefined (I think intprops.h addition should be done in a
     separate patch).

When used with Gnulib, that implementation won't work with ICC 17.0.4 20170411,which defines __GNUC__ to be 5 but does not support __builtin_add_overflow.Also, the __GNUC__ < 5 code doesn't look right for multiple reasons: (1) itdoesn't set *r when returning false, contrary to the comment describing the code(presumably the comment needs to be fixed?), (2) it incorrectly uses SSIZE_MAXinstead of INT_MAX when _REGEX_LARGE_OFFSETS is not defined, and (3) it has anunnecessary test for a > 0 (you need to test only whether a < 0).

Although these issues could be fixed individually, why waste time? Let's justuse intprops.h and avoid the hassle of hacking on the bugs of a partial replacement.

Attached please see an intprops.h addition that is done as a separate patch. Thesecond attached patch goes back to using INT_ADD_WRAPV instead ofcheck_add_overflow.

   * regex.h: Remove _Restrict_ and _Restrict_arr_ definition based
     on __STDC_VERSION__ because its usage leads to failures on
     posix/check-installed-headers-c{xx}.

That implementation assumes glibc, so won't work in Gnulib. I attempted to workaround the check-installed-headers issue in the attached patch; if this doesn'twork please let me know what the issue is.

   * regex_internal.h: Define lock_fini to empty macro because setting
     to 0 lead to build issues (error: statement with no effect
     [-Werror=unused-value]).

Making it empty is problematic, since lock_fini is supposed to be usablewherever a function call could appear. Instead, let's define it to ((void) 0).That should fix the -Wunused-value issue in a better way. See attached patch.

   2. posix/PCRE.tests: the test '(a)|\1' uses a backreference along
      with group creation and I am not sure if it is the correct
      behavior to accept it with regcomp (REG_EXTENDED).  The GNU grep
      accepts it with ERE option though.

POSIX allows recomp and grep to accept this pattern as an extension. I dunnowhat it means, though, and no doubt regcomp and grep should report an errorinstead of accepting it silently. That would be a different bugfix, though.

One other thing: let's use https: intead of http: in URLs, as per Gnulib style.We should be using these everywhere in Glibc, of course, but one step at a time.

See attached patches for what the above comments boil down to. They are intendedto be applied after the patch you posted. The patches could all be squashed.

From 961eb26b900a859d2a69e6c427dac91c8b1d1eb8 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 28 Jun 2018 12:39:30 -0700
Subject: [PATCH 1/2] posix: add intprops.h to prepare for regex sync

* include/intprops.h: New file, taken from Gnulib except
with reworded header comments.  This will be used in a
followup regex patch.
---
 ChangeLog          |   7 +
 include/intprops.h | 457 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 464 insertions(+)
 create mode 100644 include/intprops.h

diff --git a/ChangeLog b/ChangeLog
index 1f35f29d77..27793d1456 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2018-06-28  Paul Eggert  <eggert@cs.ucla.edu>
+
+	posix: add intprops.h to prepare for regex sync
+	* include/intprops.h: New file, taken from Gnulib except
+	with reworded header comments.  This will be used in a
+	followup regex patch.
+
 2018-06-28  Szabolcs Nagy  <szabolcs.nagy@arm.com>
 
 	* manual/llio.texi: Remove spurious space.
diff --git a/include/intprops.h b/include/intprops.h
new file mode 100644
index 0000000000..a2241a2fc4
--- /dev/null
+++ b/include/intprops.h
@@ -0,0 +1,457 @@
+/* Properties of integer types.
+   Copyright (C) 2001-2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Written by Paul Eggert.  */
+
+/* This filed is shared between glibc internals and Gnulib, and uses
+   Gnulib's _GL prefix for private macros.  */
+
+#ifndef _GL_INTPROPS_H
+#define _GL_INTPROPS_H
+
+#include <limits.h>
+
+/* Return a value with the common real type of E and V and the value of V.  */
+#define _GL_INT_CONVERT(e, v) (0 * (e) + (v))
+
+/* Act like _GL_INT_CONVERT (E, -V) but work around a bug in IRIX 6.5 cc; see
+   <https://lists.gnu.org/r/bug-gnulib/2011-05/msg00406.html>.  */
+#define _GL_INT_NEGATE_CONVERT(e, v) (0 * (e) - (v))
+
+/* The extra casts in the following macros work around compiler bugs,
+   e.g., in Cray C 5.0.3.0.  */
+
+/* True if the arithmetic type T is an integer type.  bool counts as
+   an integer.  */
+#define TYPE_IS_INTEGER(t) ((t) 1.5 == 1)
+
+/* True if the real type T is signed.  */
+#define TYPE_SIGNED(t) (! ((t) 0 < (t) -1))
+
+/* Return 1 if the real expression E, after promotion, has a
+   signed or floating type.  */
+#define EXPR_SIGNED(e) (_GL_INT_NEGATE_CONVERT (e, 1) < 0)
+
+
+/* Minimum and maximum values for integer types and expressions.  */
+
+/* The width in bits of the integer type or expression T.
+   Padding bits are not supported; this is checked at compile-time below.  */
+#define TYPE_WIDTH(t) (sizeof (t) * CHAR_BIT)
+
+/* The maximum and minimum values for the integer type T.  */
+#define TYPE_MINIMUM(t) ((t) ~ TYPE_MAXIMUM (t))
+#define TYPE_MAXIMUM(t)                                                 \
+  ((t) (! TYPE_SIGNED (t)                                               \
+        ? (t) -1                                                        \
+        : ((((t) 1 << (TYPE_WIDTH (t) - 2)) - 1) * 2 + 1)))
+
+/* The maximum and minimum values for the type of the expression E,
+   after integer promotion.  E should not have side effects.  */
+#define _GL_INT_MINIMUM(e)                                              \
+  (EXPR_SIGNED (e)                                                      \
+   ? ~ _GL_SIGNED_INT_MAXIMUM (e)                                       \
+   : _GL_INT_CONVERT (e, 0))
+#define _GL_INT_MAXIMUM(e)                                              \
+  (EXPR_SIGNED (e)                                                      \
+   ? _GL_SIGNED_INT_MAXIMUM (e)                                         \
+   : _GL_INT_NEGATE_CONVERT (e, 1))
+#define _GL_SIGNED_INT_MAXIMUM(e)                                       \
+  (((_GL_INT_CONVERT (e, 1) << (TYPE_WIDTH ((e) + 0) - 2)) - 1) * 2 + 1)
+
+/* Work around OpenVMS incompatibility with C99.  */
+#if !defined LLONG_MAX && defined __INT64_MAX
+# define LLONG_MAX __INT64_MAX
+# define LLONG_MIN __INT64_MIN
+#endif
+
+/* This include file assumes that signed types are two's complement without
+   padding bits; the above macros have undefined behavior otherwise.
+   If this is a problem for you, please let us know how to fix it for your host.
+   This assumption is tested by the intprops-tests module.  */
+
+/* Does the __typeof__ keyword work?  This could be done by
+   'configure', but for now it's easier to do it by hand.  */
+#if (2 <= __GNUC__ \
+     || (1210 <= __IBMC__ && defined __IBM__TYPEOF__) \
+     || (0x5110 <= __SUNPRO_C && !__STDC__))
+# define _GL_HAVE___TYPEOF__ 1
+#else
+# define _GL_HAVE___TYPEOF__ 0
+#endif
+
+/* Return 1 if the integer type or expression T might be signed.  Return 0
+   if it is definitely unsigned.  This macro does not evaluate its argument,
+   and expands to an integer constant expression.  */
+#if _GL_HAVE___TYPEOF__
+# define _GL_SIGNED_TYPE_OR_EXPR(t) TYPE_SIGNED (__typeof__ (t))
+#else
+# define _GL_SIGNED_TYPE_OR_EXPR(t) 1
+#endif
+
+/* Bound on length of the string representing an unsigned integer
+   value representable in B bits.  log10 (2.0) < 146/485.  The
+   smallest value of B where this bound is not tight is 2621.  */
+#define INT_BITS_STRLEN_BOUND(b) (((b) * 146 + 484) / 485)
+
+/* Bound on length of the string representing an integer type or expression T.
+   Subtract 1 for the sign bit if T is signed, and then add 1 more for
+   a minus sign if needed.
+
+   Because _GL_SIGNED_TYPE_OR_EXPR sometimes returns 0 when its argument is
+   signed, this macro may overestimate the true bound by one byte when
+   applied to unsigned types of size 2, 4, 16, ... bytes.  */
+#define INT_STRLEN_BOUND(t)                                     \
+  (INT_BITS_STRLEN_BOUND (TYPE_WIDTH (t) - _GL_SIGNED_TYPE_OR_EXPR (t)) \
+   + _GL_SIGNED_TYPE_OR_EXPR (t))
+
+/* Bound on buffer size needed to represent an integer type or expression T,
+   including the terminating null.  */
+#define INT_BUFSIZE_BOUND(t) (INT_STRLEN_BOUND (t) + 1)
+
+
+/* Range overflow checks.
+
+   The INT_<op>_RANGE_OVERFLOW macros return 1 if the corresponding C
+   operators might not yield numerically correct answers due to
+   arithmetic overflow.  They do not rely on undefined or
+   implementation-defined behavior.  Their implementations are simple
+   and straightforward, but they are a bit harder to use than the
+   INT_<op>_OVERFLOW macros described below.
+
+   Example usage:
+
+     long int i = ...;
+     long int j = ...;
+     if (INT_MULTIPLY_RANGE_OVERFLOW (i, j, LONG_MIN, LONG_MAX))
+       printf ("multiply would overflow");
+     else
+       printf ("product is %ld", i * j);
+
+   Restrictions on *_RANGE_OVERFLOW macros:
+
+   These macros do not check for all possible numerical problems or
+   undefined or unspecified behavior: they do not check for division
+   by zero, for bad shift counts, or for shifting negative numbers.
+
+   These macros may evaluate their arguments zero or multiple times,
+   so the arguments should not have side effects.  The arithmetic
+   arguments (including the MIN and MAX arguments) must be of the same
+   integer type after the usual arithmetic conversions, and the type
+   must have minimum value MIN and maximum MAX.  Unsigned types should
+   use a zero MIN of the proper type.
+
+   These macros are tuned for constant MIN and MAX.  For commutative
+   operations such as A + B, they are also tuned for constant B.  */
+
+/* Return 1 if A + B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  */
+#define INT_ADD_RANGE_OVERFLOW(a, b, min, max)          \
+  ((b) < 0                                              \
+   ? (a) < (min) - (b)                                  \
+   : (max) - (b) < (a))
+
+/* Return 1 if A - B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  */
+#define INT_SUBTRACT_RANGE_OVERFLOW(a, b, min, max)     \
+  ((b) < 0                                              \
+   ? (max) + (b) < (a)                                  \
+   : (a) < (min) + (b))
+
+/* Return 1 if - A would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  */
+#define INT_NEGATE_RANGE_OVERFLOW(a, min, max)          \
+  ((min) < 0                                            \
+   ? (a) < - (max)                                      \
+   : 0 < (a))
+
+/* Return 1 if A * B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  Avoid && and || as they tickle
+   bugs in Sun C 5.11 2010/08/13 and other compilers; see
+   <https://lists.gnu.org/r/bug-gnulib/2011-05/msg00401.html>.  */
+#define INT_MULTIPLY_RANGE_OVERFLOW(a, b, min, max)     \
+  ((b) < 0                                              \
+   ? ((a) < 0                                           \
+      ? (a) < (max) / (b)                               \
+      : (b) == -1                                       \
+      ? 0                                               \
+      : (min) / (b) < (a))                              \
+   : (b) == 0                                           \
+   ? 0                                                  \
+   : ((a) < 0                                           \
+      ? (a) < (min) / (b)                               \
+      : (max) / (b) < (a)))
+
+/* Return 1 if A / B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  Do not check for division by zero.  */
+#define INT_DIVIDE_RANGE_OVERFLOW(a, b, min, max)       \
+  ((min) < 0 && (b) == -1 && (a) < - (max))
+
+/* Return 1 if A % B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  Do not check for division by zero.
+   Mathematically, % should never overflow, but on x86-like hosts
+   INT_MIN % -1 traps, and the C standard permits this, so treat this
+   as an overflow too.  */
+#define INT_REMAINDER_RANGE_OVERFLOW(a, b, min, max)    \
+  INT_DIVIDE_RANGE_OVERFLOW (a, b, min, max)
+
+/* Return 1 if A << B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  Here, MIN and MAX are for A only, and B need
+   not be of the same type as the other arguments.  The C standard says that
+   behavior is undefined for shifts unless 0 <= B < wordwidth, and that when
+   A is negative then A << B has undefined behavior and A >> B has
+   implementation-defined behavior, but do not check these other
+   restrictions.  */
+#define INT_LEFT_SHIFT_RANGE_OVERFLOW(a, b, min, max)   \
+  ((a) < 0                                              \
+   ? (a) < (min) >> (b)                                 \
+   : (max) >> (b) < (a))
+
+/* True if __builtin_add_overflow (A, B, P) works when P is non-null.  */
+#if 5 <= __GNUC__ && !defined __ICC
+# define _GL_HAS_BUILTIN_OVERFLOW 1
+#else
+# define _GL_HAS_BUILTIN_OVERFLOW 0
+#endif
+
+/* True if __builtin_add_overflow_p (A, B, C) works.  */
+#define _GL_HAS_BUILTIN_OVERFLOW_P (7 <= __GNUC__)
+
+/* The _GL*_OVERFLOW macros have the same restrictions as the
+   *_RANGE_OVERFLOW macros, except that they do not assume that operands
+   (e.g., A and B) have the same type as MIN and MAX.  Instead, they assume
+   that the result (e.g., A + B) has that type.  */
+#if _GL_HAS_BUILTIN_OVERFLOW_P
+# define _GL_ADD_OVERFLOW(a, b, min, max)                               \
+   __builtin_add_overflow_p (a, b, (__typeof__ ((a) + (b))) 0)
+# define _GL_SUBTRACT_OVERFLOW(a, b, min, max)                          \
+   __builtin_sub_overflow_p (a, b, (__typeof__ ((a) - (b))) 0)
+# define _GL_MULTIPLY_OVERFLOW(a, b, min, max)                          \
+   __builtin_mul_overflow_p (a, b, (__typeof__ ((a) * (b))) 0)
+#else
+# define _GL_ADD_OVERFLOW(a, b, min, max)                                \
+   ((min) < 0 ? INT_ADD_RANGE_OVERFLOW (a, b, min, max)                  \
+    : (a) < 0 ? (b) <= (a) + (b)                                         \
+    : (b) < 0 ? (a) <= (a) + (b)                                         \
+    : (a) + (b) < (b))
+# define _GL_SUBTRACT_OVERFLOW(a, b, min, max)                           \
+   ((min) < 0 ? INT_SUBTRACT_RANGE_OVERFLOW (a, b, min, max)             \
+    : (a) < 0 ? 1                                                        \
+    : (b) < 0 ? (a) - (b) <= (a)                                         \
+    : (a) < (b))
+# define _GL_MULTIPLY_OVERFLOW(a, b, min, max)                           \
+   (((min) == 0 && (((a) < 0 && 0 < (b)) || ((b) < 0 && 0 < (a))))       \
+    || INT_MULTIPLY_RANGE_OVERFLOW (a, b, min, max))
+#endif
+#define _GL_DIVIDE_OVERFLOW(a, b, min, max)                             \
+  ((min) < 0 ? (b) == _GL_INT_NEGATE_CONVERT (min, 1) && (a) < - (max)  \
+   : (a) < 0 ? (b) <= (a) + (b) - 1                                     \
+   : (b) < 0 && (a) + (b) <= (a))
+#define _GL_REMAINDER_OVERFLOW(a, b, min, max)                          \
+  ((min) < 0 ? (b) == _GL_INT_NEGATE_CONVERT (min, 1) && (a) < - (max)  \
+   : (a) < 0 ? (a) % (b) != ((max) - (b) + 1) % (b)                     \
+   : (b) < 0 && ! _GL_UNSIGNED_NEG_MULTIPLE (a, b, max))
+
+/* Return a nonzero value if A is a mathematical multiple of B, where
+   A is unsigned, B is negative, and MAX is the maximum value of A's
+   type.  A's type must be the same as (A % B)'s type.  Normally (A %
+   -B == 0) suffices, but things get tricky if -B would overflow.  */
+#define _GL_UNSIGNED_NEG_MULTIPLE(a, b, max)                            \
+  (((b) < -_GL_SIGNED_INT_MAXIMUM (b)                                   \
+    ? (_GL_SIGNED_INT_MAXIMUM (b) == (max)                              \
+       ? (a)                                                            \
+       : (a) % (_GL_INT_CONVERT (a, _GL_SIGNED_INT_MAXIMUM (b)) + 1))   \
+    : (a) % - (b))                                                      \
+   == 0)
+
+/* Check for integer overflow, and report low order bits of answer.
+
+   The INT_<op>_OVERFLOW macros return 1 if the corresponding C operators
+   might not yield numerically correct answers due to arithmetic overflow.
+   The INT_<op>_WRAPV macros also store the low-order bits of the answer.
+   These macros work correctly on all known practical hosts, and do not rely
+   on undefined behavior due to signed arithmetic overflow.
+
+   Example usage, assuming A and B are long int:
+
+     if (INT_MULTIPLY_OVERFLOW (a, b))
+       printf ("result would overflow\n");
+     else
+       printf ("result is %ld (no overflow)\n", a * b);
+
+   Example usage with WRAPV flavor:
+
+     long int result;
+     bool overflow = INT_MULTIPLY_WRAPV (a, b, &result);
+     printf ("result is %ld (%s)\n", result,
+             overflow ? "after overflow" : "no overflow");
+
+   Restrictions on these macros:
+
+   These macros do not check for all possible numerical problems or
+   undefined or unspecified behavior: they do not check for division
+   by zero, for bad shift counts, or for shifting negative numbers.
+
+   These macros may evaluate their arguments zero or multiple times, so the
+   arguments should not have side effects.
+
+   The WRAPV macros are not constant expressions.  They support only
+   +, binary -, and *.  The result type must be signed.
+
+   These macros are tuned for their last argument being a constant.
+
+   Return 1 if the integer expressions A * B, A - B, -A, A * B, A / B,
+   A % B, and A << B would overflow, respectively.  */
+
+#define INT_ADD_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_ADD_OVERFLOW)
+#define INT_SUBTRACT_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_SUBTRACT_OVERFLOW)
+#if _GL_HAS_BUILTIN_OVERFLOW_P
+# define INT_NEGATE_OVERFLOW(a) INT_SUBTRACT_OVERFLOW (0, a)
+#else
+# define INT_NEGATE_OVERFLOW(a) \
+   INT_NEGATE_RANGE_OVERFLOW (a, _GL_INT_MINIMUM (a), _GL_INT_MAXIMUM (a))
+#endif
+#define INT_MULTIPLY_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_MULTIPLY_OVERFLOW)
+#define INT_DIVIDE_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_DIVIDE_OVERFLOW)
+#define INT_REMAINDER_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_REMAINDER_OVERFLOW)
+#define INT_LEFT_SHIFT_OVERFLOW(a, b) \
+  INT_LEFT_SHIFT_RANGE_OVERFLOW (a, b, \
+                                 _GL_INT_MINIMUM (a), _GL_INT_MAXIMUM (a))
+
+/* Return 1 if the expression A <op> B would overflow,
+   where OP_RESULT_OVERFLOW (A, B, MIN, MAX) does the actual test,
+   assuming MIN and MAX are the minimum and maximum for the result type.
+   Arguments should be free of side effects.  */
+#define _GL_BINARY_OP_OVERFLOW(a, b, op_result_overflow)        \
+  op_result_overflow (a, b,                                     \
+                      _GL_INT_MINIMUM (0 * (b) + (a)),          \
+                      _GL_INT_MAXIMUM (0 * (b) + (a)))
+
+/* Store the low-order bits of A + B, A - B, A * B, respectively, into *R.
+   Return 1 if the result overflows.  See above for restrictions.  */
+#define INT_ADD_WRAPV(a, b, r) \
+  _GL_INT_OP_WRAPV (a, b, r, +, __builtin_add_overflow, INT_ADD_OVERFLOW)
+#define INT_SUBTRACT_WRAPV(a, b, r) \
+  _GL_INT_OP_WRAPV (a, b, r, -, __builtin_sub_overflow, INT_SUBTRACT_OVERFLOW)
+#define INT_MULTIPLY_WRAPV(a, b, r) \
+  _GL_INT_OP_WRAPV (a, b, r, *, __builtin_mul_overflow, INT_MULTIPLY_OVERFLOW)
+
+/* Nonzero if this compiler has GCC bug 68193 or Clang bug 25390.  See:
+   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68193
+   https://llvm.org/bugs/show_bug.cgi?id=25390
+   For now, assume all versions of GCC-like compilers generate bogus
+   warnings for _Generic.  This matters only for older compilers that
+   lack __builtin_add_overflow.  */
+#if __GNUC__
+# define _GL__GENERIC_BOGUS 1
+#else
+# define _GL__GENERIC_BOGUS 0
+#endif
+
+/* Store the low-order bits of A <op> B into *R, where OP specifies
+   the operation.  BUILTIN is the builtin operation, and OVERFLOW the
+   overflow predicate.  Return 1 if the result overflows.  See above
+   for restrictions.  */
+#if _GL_HAS_BUILTIN_OVERFLOW
+# define _GL_INT_OP_WRAPV(a, b, r, op, builtin, overflow) builtin (a, b, r)
+#elif 201112 <= __STDC_VERSION__ && !_GL__GENERIC_BOGUS
+# define _GL_INT_OP_WRAPV(a, b, r, op, builtin, overflow) \
+   (_Generic \
+    (*(r), \
+     signed char: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \
+                        signed char, SCHAR_MIN, SCHAR_MAX), \
+     short int: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \
+                        short int, SHRT_MIN, SHRT_MAX), \
+     int: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \
+                        int, INT_MIN, INT_MAX), \
+     long int: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long int, \
+                        long int, LONG_MIN, LONG_MAX), \
+     long long int: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long long int, \
+                        long long int, LLONG_MIN, LLONG_MAX)))
+#else
+# define _GL_INT_OP_WRAPV(a, b, r, op, builtin, overflow) \
+   (sizeof *(r) == sizeof (signed char) \
+    ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \
+                       signed char, SCHAR_MIN, SCHAR_MAX) \
+    : sizeof *(r) == sizeof (short int) \
+    ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \
+                       short int, SHRT_MIN, SHRT_MAX) \
+    : sizeof *(r) == sizeof (int) \
+    ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \
+                       int, INT_MIN, INT_MAX) \
+    : _GL_INT_OP_WRAPV_LONGISH(a, b, r, op, overflow))
+# ifdef LLONG_MAX
+#  define _GL_INT_OP_WRAPV_LONGISH(a, b, r, op, overflow) \
+    (sizeof *(r) == sizeof (long int) \
+     ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long int, \
+                        long int, LONG_MIN, LONG_MAX) \
+     : _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long long int, \
+                        long long int, LLONG_MIN, LLONG_MAX))
+# else
+#  define _GL_INT_OP_WRAPV_LONGISH(a, b, r, op, overflow) \
+    _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long int, \
+                     long int, LONG_MIN, LONG_MAX)
+# endif
+#endif
+
+/* Store the low-order bits of A <op> B into *R, where the operation
+   is given by OP.  Use the unsigned type UT for calculation to avoid
+   overflow problems.  *R's type is T, with extrema TMIN and TMAX.
+   T must be a signed integer type.  Return 1 if the result overflows.  */
+#define _GL_INT_OP_CALC(a, b, r, op, overflow, ut, t, tmin, tmax) \
+  (sizeof ((a) op (b)) < sizeof (t) \
+   ? _GL_INT_OP_CALC1 ((t) (a), (t) (b), r, op, overflow, ut, t, tmin, tmax) \
+   : _GL_INT_OP_CALC1 (a, b, r, op, overflow, ut, t, tmin, tmax))
+#define _GL_INT_OP_CALC1(a, b, r, op, overflow, ut, t, tmin, tmax) \
+  ((overflow (a, b) \
+    || (EXPR_SIGNED ((a) op (b)) && ((a) op (b)) < (tmin)) \
+    || (tmax) < ((a) op (b))) \
+   ? (*(r) = _GL_INT_OP_WRAPV_VIA_UNSIGNED (a, b, op, ut, t), 1) \
+   : (*(r) = _GL_INT_OP_WRAPV_VIA_UNSIGNED (a, b, op, ut, t), 0))
+
+/* Return the low-order bits of A <op> B, where the operation is given
+   by OP.  Use the unsigned type UT for calculation to avoid undefined
+   behavior on signed integer overflow, and convert the result to type T.
+   UT is at least as wide as T and is no narrower than unsigned int,
+   T is two's complement, and there is no padding or trap representations.
+   Assume that converting UT to T yields the low-order bits, as is
+   done in all known two's-complement C compilers.  E.g., see:
+   https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
+
+   According to the C standard, converting UT to T yields an
+   implementation-defined result or signal for values outside T's
+   range.  However, code that works around this theoretical problem
+   runs afoul of a compiler bug in Oracle Studio 12.3 x86.  See:
+   https://lists.gnu.org/r/bug-gnulib/2017-04/msg00049.html
+   As the compiler bug is real, don't try to work around the
+   theoretical problem.  */
+
+#define _GL_INT_OP_WRAPV_VIA_UNSIGNED(a, b, op, ut, t) \
+  ((t) ((ut) (a) op (ut) (b)))
+
+#endif /* _GL_INTPROPS_H */
-- 
2.17.1

From 86022a73a20f701ec54eb9baa5d963d3c665051d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 28 Jun 2018 12:49:07 -0700
Subject: [PATCH 2/2] posix: finish merge with Gnulib regex

* posix/regex_internal.h: Include intprops.h.
* posix/regcomp.c (re_compile_internal): Omit unnecessary call to
__libc_lock_init, as the previous lock_init has already
initialized the lock.
* posix/regex.h (_Restrict_, _Restrict_arr_):
Port to Gnulib case, where __restrict and __restrict_arr
might not work.  Do this in a better way than Gnulib
did until today.
* posix/regexec.c (check_add_overflow_idx):
Remove, as it was not portable to ICC, and had some problems
when __GNUC__ < 5.  Use replaced by intprops.h's INT_ADD_WRAPV.
---
 ChangeLog              | 13 +++++++++++++
 posix/regcomp.c        |  4 +---
 posix/regex.c          |  2 +-
 posix/regex.h          | 27 ++++++++++++++++++++-------
 posix/regex_internal.c |  2 +-
 posix/regex_internal.h |  6 ++++--
 posix/regexec.c        | 21 ++-------------------
 7 files changed, 42 insertions(+), 33 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 27793d1456..38935253b0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,18 @@
 2018-06-28  Paul Eggert  <eggert@cs.ucla.edu>
 
+	posix: finish merge with Gnulib regex
+	* posix/regex_internal.h: Include intprops.h.
+	* posix/regcomp.c (re_compile_internal): Omit unnecessary call to
+	__libc_lock_init, as the previous lock_init has already
+	initialized the lock.
+	* posix/regex.h (_Restrict_, _Restrict_arr_):
+	Port to Gnulib case, where __restrict and __restrict_arr
+	might not work.  Do this in a better way than Gnulib
+	did until today.
+	* posix/regexec.c (check_add_overflow_idx):
+	Remove, as it was not portable to ICC, and had some problems
+	when __GNUC__ < 5.  Use replaced by intprops.h's INT_ADD_WRAPV.
+
 	posix: add intprops.h to prepare for regex sync
 	* include/intprops.h: New file, taken from Gnulib except
 	with reworded header comments.  This will be used in a
diff --git a/posix/regcomp.c b/posix/regcomp.c
index 8431268b1c..6aaa54f66d 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -15,7 +15,7 @@
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 #ifdef _LIBC
 # include <locale/weight.h>
@@ -784,8 +784,6 @@ re_compile_internal (regex_t *preg, const char * pattern, size_t length,
   strncpy (dfa->re_str, pattern, length + 1);
 #endif
 
-  __libc_lock_init (dfa->lock);
-
   err = re_string_construct (&regexp, pattern, length, preg->translate,
 			     (syntax & RE_ICASE) != 0, dfa);
   if (BE (err != REG_NOERROR, 0))
diff --git a/posix/regex.c b/posix/regex.c
index cfadc6b096..d6591e8670 100644
--- a/posix/regex.c
+++ b/posix/regex.c
@@ -15,7 +15,7 @@
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 #ifndef _LIBC
 # include <config.h>
diff --git a/posix/regex.h b/posix/regex.h
index eef262981c..32933bc6d5 100644
--- a/posix/regex.h
+++ b/posix/regex.h
@@ -15,7 +15,7 @@
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 #ifndef _REGEX_H
 #define _REGEX_H 1
@@ -607,19 +607,32 @@ extern int re_exec (const char *);
 # endif
 #endif
 
-/* GCC 2.95 and later have "__restrict"; C99 compilers have
+/* For plain 'restrict', use glibc's __restrict if defined.
+   Otherwise, GCC 2.95 and later have "__restrict"; C99 compilers have
    "restrict", and "configure" may have defined "restrict".
    Other compilers use __restrict, __restrict__, and _Restrict, and
    'configure' might #define 'restrict' to those words, so pick a
    different name.  */
 #ifndef _Restrict_
-# define _Restrict_ __restrict
+# if defined __restrict || 2 < __GNUC__ + (95 <= __GNUC_MINOR__)
+#  define _Restrict_ __restrict
+# elif 199901L <= __STDC_VERSION__ || defined restrict
+#  define _Restrict_ restrict
+# else
+#  define _Restrict_
+# endif
 #endif
-/* gcc 3.1 and up support the [restrict] syntax.  Don't trust
-   sys/cdefs.h's definition of __restrict_arr, though, as it
-   mishandles gcc -ansi -pedantic.  */
+/* For [restrict], use glibc's __restrict_arr if available.
+   Otherwise, GCC 3.1 (not in C++ mode) and C99 support [restrict].  */
 #ifndef _Restrict_arr_
-# define _Restrict_arr_ __restrict_arr
+# ifdef __restrict_arr
+#  define _Restrict_arr_ __restrict_arr
+# elif ((199901L <= __STDC_VERSION__ || 3 < __GNUC__ + (1 <= __GNUC_MINOR__)) \
+        && !defined __GNUG__)
+#  define _Restrict_arr_ _Restrict_
+# else
+#  define _Restrict_arr_
+# endif
 #endif
 
 /* POSIX compatibility.  */
diff --git a/posix/regex_internal.c b/posix/regex_internal.c
index 2a0c01556e..7f0083b918 100644
--- a/posix/regex_internal.c
+++ b/posix/regex_internal.c
@@ -15,7 +15,7 @@
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 static void re_string_construct_common (const char *str, Idx len,
 					re_string_t *pstr,
diff --git a/posix/regex_internal.h b/posix/regex_internal.h
index e621b3eead..b27afbda2a 100644
--- a/posix/regex_internal.h
+++ b/posix/regex_internal.h
@@ -15,7 +15,7 @@
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 #ifndef _REGEX_INTERNAL_H
 #define _REGEX_INTERNAL_H 1
@@ -33,11 +33,13 @@
 #include <stdbool.h>
 #include <stdint.h>
 
+#include "intprops.h"
+
 #ifdef _LIBC
 # include <libc-lock.h>
 # define lock_define(name) __libc_lock_define (, name)
 # define lock_init(lock) (__libc_lock_init (lock), 0)
-# define lock_fini(lock)
+# define lock_fini(lock) ((void) 0)
 # define lock_lock(lock) __libc_lock_lock (lock)
 # define lock_unlock(lock) __libc_lock_unlock (lock)
 #elif defined GNULIB_LOCK && !defined USE_UNLOCKED_IO
diff --git a/posix/regexec.c b/posix/regexec.c
index 573fc877e1..63aef97535 100644
--- a/posix/regexec.c
+++ b/posix/regexec.c
@@ -15,7 +15,7 @@
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 static reg_errcode_t match_ctx_init (re_match_context_t *cache, int eflags,
 				     Idx n);
@@ -317,23 +317,6 @@ re_search_2 (struct re_pattern_buffer *bufp, const char *string1, Idx length1,
 weak_alias (__re_search_2, re_search_2)
 #endif
 
-/* Set *R = A + B.  Return true if the answer is mathematically incorrect due
-   to overflow; in this case, *R is the low order bits of the correct
-   answer.  */
-static inline bool
-check_add_overflow_idx (Idx a, Idx b, Idx *r)
-{
-#if __GNUC__ >= 5
-  return __builtin_add_overflow (a, b, r);
-#else
-  if ((a > 0 && b > SSIZE_MAX - a)
-      || (a < 0 && b < SSIZE_MIN -a)
-    return false;
-  *r = a + b;
-  return true;
-#endif
-}
-
 static regoff_t
 re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
 		  Idx length1, const char *string2, Idx length2, Idx start,
@@ -346,7 +329,7 @@ re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
   char *s = NULL;
 
   if (BE ((length1 < 0 || length2 < 0 || stop < 0
-           || check_add_overflow_idx (length1, length2, &len)),
+           || INT_ADD_WRAPV (length1, length2, &len)),
           0))
     return -2;
 
-- 
2.17.1

Follow-Ups:
- Re: [PATCH] posix: Sync gnulib regex implementation
  - From: Adhemerval Zanella

References:
- [PATCH] posix: Sync gnulib regex implementation
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]