This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] manual: Clarify the documentation of strverscmp [BZ #20524]


On 08/30/2016 12:06 AM, Michael Kerrisk wrote:

s/version comparison/version-comparison/

+implementation is based on a finite state machine, whose behavior is

s/finite state machine/finite-state machine/

Thank you for your corrections.

 @item
-fractional/fractional: the things become a bit more complex.
-If the common prefix contains only leading zeroes, the longest part is less
-than the other one; else the comparison behaves normally.
+Corresponding non-digit sequences in both strings are compared
+lexicographically.  If their lengths differ, the shorter non-digit

Should this be "If their lengths differ, and the shorter string is
equal to the corresponding prefix in the longer string..."?

I don't think it matters. If the shorter sequence is not a prefix of the other sequence, the lexicographic ordering will find a difference before the extension character.

What about this?

@item
Corresponding non-digit sequences in both strings are compared
lexicographically if their lengths are equal.  If the lengths differ,
the shorter non-digit sequence is extended with the input string
character immediately following it (which can be the null terminator),
the other sequence is truncated to be of the same (extended) length, and
these two sequences are compared lexicographically.  In this last case,
the sequence comparison determines the result of the function because
the extension character (or some character before it) is necessarily
different from the character at the same offset in the other input
string.


+@item
+If the two digit sequences have no leading zeros, they are compared as
+integers, that is, the string with the longer digit sequence is deemed
+larger, and if both sequences are of equal length, they are compared
+lexicographically.
+
+@item
+If both digit sequences have an equal, positive number of leading zeros,

Why is the word "positive" here?

It's used as a synonym for “non-zero” (to discriminate this case from the previous one).

+they are compared lexicographically.  If their length differs, another
+character is added to to the shorter sequence,

"another character is added to to the shorter sequence" is vague. You
want wording like you used above.

Is this better?

@item
If both digit sequences have an equal, positive number of leading zeros,
they are compared lexicographically if their lengths are the same.  If
the lengths differ, the shorter sequence is extended with the following
character in its input string, and the other sequence is truncated to
the same length, and both sequences are compared lexicographically
(similar to the non-digit sequence case above).

I'm attaching a new patch.

I've not completely checked all of the details, but what you write
certainly matches what I did check, and is a great deal better than
the existing text. But, the algorithm it describes *is* strange.

The description is based on fuzz-strverscmp-alt.c, which is intended to run under a fuzzer to show that both implementations are equivalent. fuzz-strverscmp.c was a previous attempt at resolving the state machine, but it was still not very clear. fuzz-strverscmp3.c tests that the comparison is indeed a linear order. fuzz-strverscmp3-old2.c demonstrates that this works because with a start file of \003\0031.11.21.3, it quickly finds the ordering violations in the old implementation (from commit 4546646233574f321f9deedff928b980d82f4fc7).

Thanks,
Florian
manual: Clarify the documentation of strverscmp [BZ #20524]

2016-08-30  Florian Weimer  <fweimer@redhat.com>

	[BZ #20524]
	* manual/string.texi (String/Array Comparison): Clarify the
	strverscmp behavior.

diff --git a/manual/string.texi b/manual/string.texi
index bce81a7..193f786 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -1374,46 +1374,75 @@ The @code{strverscmp} function compares the string @var{s1} against
 @var{s2}, considering them as holding indices/version numbers.  The
 return value follows the same conventions as found in the
 @code{strcmp} function.  In fact, if @var{s1} and @var{s2} contain no
-digits, @code{strverscmp} behaves like @code{strcmp}.
+digits, @code{strverscmp} behaves like @code{strcmp}
+(in the sense that the sign of the result is the same).
 
-Basically, we compare strings normally (byte by byte), until
-we find a digit in each string - then we enter a special comparison
-mode, where each sequence of digits is taken as a whole.  If we reach the
-end of these two parts without noticing a difference, we return to the
-standard comparison mode.  There are two types of numeric parts:
-"integral" and "fractional" (those  begin with a '0').  The types
-of the numeric parts affect the way we sort them:
+The comparison algorithm which the @code{strverscmp} function implements
+differs slightly from other version-comparison algorithms.  The
+implementation is based on a finite-state machine, whose behavior is
+approximated below.
 
 @itemize @bullet
 @item
-integral/integral: we compare values as you would expect.
+The input strings are each split into sequences of non-digits and
+digits.  These sequences can be empty at the beginning and end of the
+string.  Digits are determined by the @code{isdigit} function and are
+thus subject to the current locale.
 
 @item
-fractional/integral: the fractional part is less than the integral one.
-Again, no surprise.
+Comparison starts with a (possibly empty) non-digit sequence.  The first
+non-equal sequences of non-digits or digits determines the outcome of
+the comparison.
 
 @item
-fractional/fractional: the things become a bit more complex.
-If the common prefix contains only leading zeroes, the longest part is less
-than the other one; else the comparison behaves normally.
+Corresponding non-digit sequences in both strings are compared
+lexicographically if their lengths are equal.  If the lengths differ,
+the shorter non-digit sequence is extended with the input string
+character immediately following it (which can be the null terminator),
+the other sequence is truncated to be of the same (extended) length, and
+these two sequences are compared lexicographically.  In this last case,
+the sequence comparison determines the result of the function because
+the extension character (or some character before it) is necessarily
+different from the character at the same offset in the other input
+string.
+
+@item
+For two sequences of digits, the number of leading zeros is counted (which
+can be zero).  If the count differs, the string with more leading zeros
+in the digit sequence is considered smaller than the other string.
+
+@item
+If the two sequences of digits have no leading zeros, they are compared
+as integers, that is, the string with the longer digit sequence is
+deemed larger, and if both sequences are of equal length, they are
+compared lexicographically.
+
+@item
+If both digit sequences have an equal, positive number of leading zeros,
+they are compared lexicographically if their lengths are the same.  If
+the lengths differ, the shorter sequence is extended with the following
+character in its input string, and the other sequence is truncated to
+the same length, and both sequences are compared lexicographically
+(similar to the non-digit sequence case above).
 @end itemize
 
+The treatment of leading zeros and the tie-breaking extension characters
+(which in effect propagate across non-digit/digit sequence boundaries)
+differs from other version-comparison algorithms.
+
 @smallexample
 strverscmp ("no digit", "no digit")
     @result{} 0    /* @r{same behavior as strcmp.} */
 strverscmp ("item#99", "item#100")
     @result{} <0   /* @r{same prefix, but 99 < 100.} */
 strverscmp ("alpha1", "alpha001")
-    @result{} >0   /* @r{fractional part inferior to integral one.} */
+    @result{} >0   /* @r{different number of leading zeros (0 and 2).} */
 strverscmp ("part1_f012", "part1_f01")
-    @result{} >0   /* @r{two fractional parts.} */
+    @result{} >0   /* @r{lexicographical comparison with leading zeros.} */
 strverscmp ("foo.009", "foo.0")
-    @result{} <0   /* @r{idem, but with leading zeroes only.} */
+    @result{} <0   /* @r{different number of leading zeros (2 and 1).} */
 @end smallexample
 
-This function is especially useful when dealing with filename sorting,
-because filenames frequently hold indices/version numbers.
-
 @code{strverscmp} is a GNU extension.
 @end deftypefun
 
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#if 0
# define DEBUG(...) printf (__VA_ARGS__)
#else
# define DEBUG(...) (void)0
#endif

static int verscmp_gnu (const char *, const char *);

static int sgn(int i)
{
  if (i == 0)
    return 0;
  else if (i < 0)
    return -1;
  else
    return 1;
}

static int
cmp_for_qsort (const void *a, const void *b)
{
  const char *const *a1 = a;
  const char *const *b1 = b;
  return verscmp_gnu (*a1, *b1);
}

static void
report (const char *left, const char *right, int expected, int actual)
{
  printf ("left:     [[%s]]\n", left);
  printf ("right:    [[%s]]\n", right);
  printf ("expected: %d\n", expected);
  printf ("actual:   %d\n", actual);
  abort ();
}

enum {count = 3};

static void
compare (char **array, int *indices)
{
  for (int i = 0; i < count; ++i)
    for (int j = 0; j < count; ++j)
      {
	int expected = sgn (indices[i] - indices[j]);
	int actual = sgn (verscmp_gnu (array[i], array[j]));
	if (expected != actual)
	  report (array[i], array[j], expected, actual);
      }
}

int main (void)
{
  unsigned char input[1024];
  ssize_t length = read (STDIN_FILENO, input, sizeof (input));
  if (length < 0)
    {
      printf ("error: read: %m\n");
      abort ();
    }
  if (length < 2)
    return 0;

  size_t a_length = input[0];
  size_t b_length = input[1];
  if (2 + a_length + b_length > length)
    return 0;
  size_t c_length = length - 2 - a_length - b_length;

  char a[sizeof (input)];
  memcpy (a, input + 2, a_length);
  a[a_length] = '\0';

  char b[sizeof (input)];
  memcpy (b, input + 2 + a_length, b_length);
  b[b_length] = '\0';

  char c[sizeof (input)];
  memcpy (c, input + 2 + a_length + b_length, c_length);
  c[c_length] = '\0';

  char *array[count] = {a, b, c};
  qsort (array, count, sizeof (char *), cmp_for_qsort);
  int indices[count];
  {
    indices[0] = 0;
    for (int i = 1; i < count; ++i)
      if (verscmp_gnu (array[i - 1], array[i]) == 0)
	indices[i] = indices[i - 1];
      else
	indices[i] = indices[i - 1] + 1;
  }

  compare (array, indices);
}

static bool
myisdigit (int ch)
{
  return '0' <= ch && ch <= '9';
}

#define isdigit myisdigit

/* Compare strings while treating digits characters numerically.
   Copyright (C) 1997, 2002, 2009 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Jean-François Bignolles <bignolle@ecoledoc.ibp.fr>, 1997.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, write to the Free
   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
   02111-1307 USA.  */

#include <stdint.h>
#include <string.h>

/* states: S_N: normal, S_I: comparing integral part, S_F: comparing
           fractionnal parts, S_Z: idem but with leading Zeroes only */
#define  S_N    0x0
#define  S_I    0x3
#define  S_F    0x6
#define  S_Z    0x9

/* result_type: CMP: return diff; LEN: compare using len_diff/diff */
#define  CMP    2
#define  LEN    3


/* Compare S1 and S2 as strings holding indices/version numbers,
   returning less than, equal to or greater than zero if S1 is less than,
   equal to or greater than S2 (for more info, see the texinfo doc).
*/

static int
verscmp_gnu (s1, s2)
     const char *s1;
     const char *s2;
{
  const unsigned char *p1 = (const unsigned char *) s1;
  const unsigned char *p2 = (const unsigned char *) s2;

  /* Symbol(s)    0       [1-9]   others
     Transition   (10) 0  (01) d  (00) x   */
  static const uint8_t next_state[] =
  {
      /* state    x    d    0  */
      /* S_N */  S_N, S_I, S_Z,
      /* S_I */  S_N, S_I, S_I,
      /* S_F */  S_N, S_F, S_F,
      /* S_Z */  S_N, S_F, S_Z
  };

  static const int8_t result_type[] =
  {
      /* state   x/x  x/d  x/0  d/x  d/d  d/0  0/x  0/d  0/0  */

      /* S_N */  CMP, CMP, CMP, CMP, LEN, CMP, CMP, CMP, CMP,
      /* S_I */  CMP, -1,  -1,  +1,  LEN, LEN, +1,  LEN, LEN,
      /* S_F */  CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
      /* S_Z */  CMP, +1,  +1,  -1,  CMP, CMP, -1,  CMP, CMP
  };

  if (p1 == p2)
    return 0;

  unsigned char c1 = *p1++;
  unsigned char c2 = *p2++;
  /* Hint: '0' is a digit too.  */
  int state = S_N | ((c1 == '0') + (isdigit (c1) != 0));

  int diff;
  while ((diff = c1 - c2) == 0)
    {
      if (c1 == '\0')
	return diff;

      state = next_state[state];
      c1 = *p1++;
      c2 = *p2++;
      state |= (c1 == '0') + (isdigit (c1) != 0);
    }

  state = result_type[state * 3 | (((c2 == '0') + (isdigit (c2) != 0)))];

  switch (state)
  {
    case CMP:
      return diff;

    case LEN:
      while (isdigit (*p1++))
	if (!isdigit (*p2++))
	  return 1;

      return isdigit (*p2) ? -1 : diff;

    default:
      return state;
  }
}
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#if 0
# define DEBUG(...) printf (__VA_ARGS__)
#else
# define DEBUG(...) (void)0
#endif

static int verscmp_mine (const char *, const char *);
static int verscmp_gnu (const char *, const char *);

static int sgn(int i)
{
  if (i == 0)
    return 0;
  else if (i < 0)
    return -1;
  else
    return 1;
}

int main (void)
{
  unsigned char input[4096];
  ssize_t length = read (STDIN_FILENO, input, sizeof (input));
  if (length < 0)
    {
      printf ("error: read: %m\n");
      abort ();
    }
  if (length == 0)
    return 0;

  size_t left_length = input[0];
  if (left_length >= length)
    return 0;
  size_t right_length = length - 1 - left_length;

  char left[sizeof (input)];
  memcpy (left, input + 1, left_length);
  left[left_length] = '\0';

  char right[sizeof (input)];
  memcpy (right, input + 1 + left_length, right_length);
  right[right_length] = '\0';

  int gnu = sgn (verscmp_gnu (left, right));
  int mine = sgn (verscmp_mine (left, right));
  if (gnu != mine)
    {
      printf ("left:  [[%s]]\nright: [[%s]]\ngnu:  %d\nmine: %d\n",
              left, right, gnu, mine);
      abort ();
    }
}

enum mystate { non_numeric, only_zero, zero_prefix, digits };

static enum mystate
char_to_state (char ch)
{
  if (ch == '0')
    return only_zero;
  else if ('1' <= ch && ch <= '9')
    return digits;
  else
    return non_numeric;
}

static enum mystate
join_state (enum mystate previous, enum mystate current)
{
  switch (previous)
    {
    case non_numeric:
      return current;
    case only_zero:
    case zero_prefix:
      switch (current)
	{
	case non_numeric:
	  return non_numeric;
	case only_zero:
	  return previous;
	case zero_prefix:
	case digits:
	  return zero_prefix;
	}
    case digits:
      switch (current)
        {
        case non_numeric:
          return non_numeric;
        case zero_prefix:
        case only_zero:
	case digits:
	  return digits;
        }
    }
  abort ();
}

static bool
positive (int ch)
{
  return '1' <= ch && ch <= '9';
}

static bool
myisdigit (int ch)
{
  return '0' <= ch && ch <= '9';
}

static size_t
digit_length (const char *start)
{
  const char *p;
  for (p = start; myisdigit (*p); ++p)
    ;
  return p - start;
}

static int
size_cmp (size_t left, size_t right)
{
  if (left < right)
    return -1;
  else if (left == right)
    return 0;
  else
    return 1;
}

static int
verscmp_mine (const char *left, const char *right)
{
  /* Skip the common prefix.  */
  enum mystate state = non_numeric;
  while (*left == *right)
    {
      if (*left == '\0')
        return 0;
      state = join_state (state, char_to_state (*left));
      ++left;
      ++right;
    }

  switch (state)
    {
    case non_numeric:
      DEBUG ("mystate: non_numeric\n");
      break;
    case zero_prefix:
      DEBUG ("mystate: zero_prefix\n");
      break;
    case only_zero:
      DEBUG ("mystate: only_zero\n");
      break;
    case digits:
      DEBUG ("mystate: digits\n");
      break;
    }
  
  switch (state)
    {
    case non_numeric:
      if (positive (*left) && positive (*right))
	{
	  int diff = size_cmp (digit_length (left), digit_length (right));
	  if (diff != 0)
	    return diff;
	}
    lex:
      {
        int l = *left & 0xff;
        int r = *right & 0xff;
        return l - r;
      }
    case zero_prefix:
      goto lex;
    case only_zero:
      {
	int diff = myisdigit (*left) - myisdigit (*right);
	if (diff != 0)
	  return -diff;
	goto lex;
      }
    case digits:
    num:
      {
	int diff = size_cmp (digit_length (left), digit_length (right));
	if (diff != 0)
	  return diff;
	goto lex;
      }
    }
  abort ();
}

#define isdigit myisdigit



/* Compare strings while treating digits characters numerically.
   Copyright (C) 1997-2016 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Jean-François Bignolles <bignolle@ecoledoc.ibp.fr>, 1997.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#include <stdint.h>

/* states: S_N: normal, S_I: comparing integral part, S_F: comparing
           fractionnal parts, S_Z: idem but with leading Zeroes only */
#define  S_N    0x0
#define  S_I    0x3
#define  S_F    0x6
#define  S_Z    0x9

/* result_type: CMP: return diff; LEN: compare using len_diff/diff */
#define  CMP    2
#define  LEN    3


/* Compare S1 and S2 as strings holding indices/version numbers,
   returning less than, equal to or greater than zero if S1 is less than,
   equal to or greater than S2 (for more info, see the texinfo doc).
*/

int
verscmp_gnu (const char *s1, const char *s2)
{
  const unsigned char *p1 = (const unsigned char *) s1;
  const unsigned char *p2 = (const unsigned char *) s2;

  /* Symbol(s)    0       [1-9]   others
     Transition   (10) 0  (01) d  (00) x   */
  static const uint8_t next_state[] =
  {
      /* state    x    d    0  */
      /* S_N */  S_N, S_I, S_Z,
      /* S_I */  S_N, S_I, S_I,
      /* S_F */  S_N, S_F, S_F,
      /* S_Z */  S_N, S_F, S_Z
  };

  static const int8_t result_type[] =
  {
      /* state   x/x  x/d  x/0  d/x  d/d  d/0  0/x  0/d  0/0  */

      /* S_N */  CMP, CMP, CMP, CMP, LEN, CMP, CMP, CMP, CMP,
      /* S_I */  CMP, -1,  -1,  +1,  LEN, LEN, +1,  LEN, LEN,
      /* S_F */  CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
      /* S_Z */  CMP, +1,  +1,  -1,  CMP, CMP, -1,  CMP, CMP
  };

  if (p1 == p2)
    return 0;

  unsigned char c1 = *p1++;
  unsigned char c2 = *p2++;
  /* Hint: '0' is a digit too.  */
  int state = S_N + ((c1 == '0') + (isdigit (c1) != 0));

  int diff;
  while ((diff = c1 - c2) == 0)
    {
      if (c1 == '\0')
	return diff;

      state = next_state[state];
      c1 = *p1++;
      c2 = *p2++;
      state += (c1 == '0') + (isdigit (c1) != 0);
    }

  state = result_type[state * 3 + (((c2 == '0') + (isdigit (c2) != 0)))];

  switch (state)
  {
    case CMP:
      return diff;

    case LEN:
      while (isdigit (*p1++))
	if (!isdigit (*p2++))
	  return 1;

      return isdigit (*p2) ? -1 : diff;

    default:
      return state;
  }
}
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#if 0
# define DEBUG(...) printf (__VA_ARGS__)
#else
# define DEBUG(...) (void)0
#endif

static int verscmp_mine (const char *, const char *);
static int verscmp_gnu (const char *, const char *);

static int sgn(int i)
{
  if (i == 0)
    return 0;
  else if (i < 0)
    return -1;
  else
    return 1;
}

int main (void)
{
  unsigned char input[4096];
  ssize_t length = read (STDIN_FILENO, input, sizeof (input));
  if (length < 0)
    {
      printf ("error: read: %m\n");
      abort ();
    }
  if (length == 0)
    return 0;

  size_t left_length = input[0];
  if (left_length >= length)
    return 0;
  size_t right_length = length - 1 - left_length;

  char left[sizeof (input)];
  memcpy (left, input + 1, left_length);
  left[left_length] = '\0';

  char right[sizeof (input)];
  memcpy (right, input + 1 + left_length, right_length);
  right[right_length] = '\0';

  int gnu = sgn (verscmp_gnu (left, right));
  int mine = sgn (verscmp_mine (left, right));
  if (gnu != mine)
    {
      printf ("left:  [[%s]]\nright: [[%s]]\ngnu:  %d\nmine: %d\n",
              left, right, gnu, mine);
      abort ();
    }
}

static bool
myisdigit (int ch)
{
  return '0' <= ch && ch <= '9';
}

static size_t
digit_length (const char *start)
{
  const char *p;
  for (p = start; myisdigit (*p); ++p)
    ;
  return p - start;
}

static size_t
nondigit_length (const char *start)
{
  const char *p;
  for (p = start; *p != '\0' && !myisdigit (*p); ++p)
    ;
  return p - start;
}

static int
size_cmp (size_t left, size_t right)
{
  if (left < right)
    return -1;
  else if (left == right)
    return 0;
  else
    return 1;
}

static int
char_cmp (unsigned char left, unsigned char right)
{
  return size_cmp (left, right);
}

static int
lexicographically_compare (const char *left, size_t left_length,
			   const char *right, size_t right_length)
{
  size_t to_compare;
  if (left_length < right_length)
    to_compare = left_length;
  else
    to_compare = right_length;
  int ret = memcmp (left, right, to_compare);
  if (ret != 0)
    return ret;
  return size_cmp (left_length, right_length);
}

static int
numerically_compare (const char *left, size_t left_length,
		     const char *right, size_t right_length)
{
  while (left_length > 0 && *left == '0')
    {
      ++left;
      --left_length;
    }
  while (right_length > 0 && *right == '0')
    {
      ++right;
      --right_length;
    }
  if (left_length != right_length)
    return size_cmp (left_length, right_length);
  return memcmp (left, right, left_length);
}

static bool
zeros_only (const char *s, size_t len)
{
  for (size_t i = 0; i < len; ++i)
    if (s[i] != '0')
      return false;
  return true;
}

static size_t
count_leading_zeros (const char *str)
{
  const char *p;
  for (p = str; *p == '0'; ++p)
    ;
  return p - str;
}

static int
verscmp_mine (const char *left, const char *right)
{
  while (*left != '\0' && *right != '\0')
    {
      printf ("left (1):  [[%s]]\n", left);
      printf ("right (1): [[%s]]\n", right);
      /* Process non-digits. */
      {
	size_t left_length = nondigit_length (left);
	size_t right_length = nondigit_length (right);

	size_t to_compare;
	if (left_length < right_length)
	  to_compare = left_length;
	else
	  to_compare = right_length;
	int ret = memcmp (left, right, to_compare);
	if (ret == 0)
	  {
	    if (left_length != right_length)
	      /* The following character acts as a tie-breaker.  */
	      return char_cmp (left[to_compare], right[to_compare]);
	  }
	if (ret != 0)
	  return ret;
	left += left_length;
	right += right_length;
      }

      if (*left == '\0' || *right == '\0')
	break;

      printf ("left (2):  [[%s]]\n", left);
      printf ("right (2): [[%s]]\n", right);
      
      /* Process digits.  */
      {
	size_t left_zeros = count_leading_zeros (left);
	size_t right_zeros = count_leading_zeros (right);
	int ret = -size_cmp (left_zeros, right_zeros);
	if (ret != 0)
	  return ret;

	size_t left_length = digit_length (left);
	size_t right_length = digit_length (right);
	if (*left == '0' && *right == '0')
	  {
	    ret = zeros_only (left, left_length) - zeros_only (right, right_length);
	    if (ret != 0)
	      return ret;
	    /* The following character acts as a tie-breaker.  */
	    ret = lexicographically_compare (left, left_length + 1,
					     right, right_length + 1);
	  }
	else
	  ret = numerically_compare (left, left_length,
				     right, right_length);
	if (ret != 0)
	  return ret;
	left += left_length;
	right += right_length;
      }
    }

  return (*left != '\0') - (*right != '\0');
}

#define isdigit myisdigit



/* Compare strings while treating digits characters numerically.
   Copyright (C) 1997-2016 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Jean-François Bignolles <bignolle@ecoledoc.ibp.fr>, 1997.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#include <stdint.h>

/* states: S_N: normal, S_I: comparing integral part, S_F: comparing
           fractionnal parts, S_Z: idem but with leading Zeroes only */
#define  S_N    0x0
#define  S_I    0x3
#define  S_F    0x6
#define  S_Z    0x9

/* result_type: CMP: return diff; LEN: compare using len_diff/diff */
#define  CMP    2
#define  LEN    3


/* Compare S1 and S2 as strings holding indices/version numbers,
   returning less than, equal to or greater than zero if S1 is less than,
   equal to or greater than S2 (for more info, see the texinfo doc).
*/

int
verscmp_gnu (const char *s1, const char *s2)
{
  const unsigned char *p1 = (const unsigned char *) s1;
  const unsigned char *p2 = (const unsigned char *) s2;

  /* Symbol(s)    0       [1-9]   others
     Transition   (10) 0  (01) d  (00) x   */
  static const uint8_t next_state[] =
  {
      /* state    x    d    0  */
      /* S_N */  S_N, S_I, S_Z,
      /* S_I */  S_N, S_I, S_I,
      /* S_F */  S_N, S_F, S_F,
      /* S_Z */  S_N, S_F, S_Z
  };

  static const int8_t result_type[] =
  {
      /* state   x/x  x/d  x/0  d/x  d/d  d/0  0/x  0/d  0/0  */

      /* S_N */  CMP, CMP, CMP, CMP, LEN, CMP, CMP, CMP, CMP,
      /* S_I */  CMP, -1,  -1,  +1,  LEN, LEN, +1,  LEN, LEN,
      /* S_F */  CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
      /* S_Z */  CMP, +1,  +1,  -1,  CMP, CMP, -1,  CMP, CMP
  };

  if (p1 == p2)
    return 0;

  unsigned char c1 = *p1++;
  unsigned char c2 = *p2++;
  /* Hint: '0' is a digit too.  */
  int state = S_N + ((c1 == '0') + (isdigit (c1) != 0));

  int diff;
  while ((diff = c1 - c2) == 0)
    {
      if (c1 == '\0')
	return diff;

      state = next_state[state];
      c1 = *p1++;
      c2 = *p2++;
      state += (c1 == '0') + (isdigit (c1) != 0);
    }

  state = result_type[state * 3 + (((c2 == '0') + (isdigit (c2) != 0)))];

  switch (state)
  {
    case CMP:
      return diff;

    case LEN:
      while (isdigit (*p1++))
	if (!isdigit (*p2++))
	  return 1;

      return isdigit (*p2) ? -1 : diff;

    default:
      return state;
  }
}
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#if 0
# define DEBUG(...) printf (__VA_ARGS__)
#else
# define DEBUG(...) (void)0
#endif

static int verscmp_mine (const char *, const char *);

static int sgn(int i)
{
  if (i == 0)
    return 0;
  else if (i < 0)
    return -1;
  else
    return 1;
}

static int
cmp_for_qsort (const void *a, const void *b)
{
  const char *const *a1 = a;
  const char *const *b1 = b;
  return verscmp_mine (*a1, *b1);
}

static void
report (const char *left, const char *right, int expected, int actual)
{
  printf ("left:     [[%s]]\n", left);
  printf ("right:    [[%s]]\n", right);
  printf ("expected: %d\n", expected);
  printf ("actual:   %d\n", actual);
  abort ();
}

enum {count = 3};

static void
compare (char **array, int *indices)
{
  for (int i = 0; i < count; ++i)
    for (int j = 0; j < count; ++j)
      {
	int expected = sgn (indices[i] - indices[j]);
	int actual = sgn (verscmp_mine (array[i], array[j]));
	if (expected != actual)
	  report (array[i], array[j], expected, actual);
      }
}

int main (void)
{
  unsigned char input[1024];
  ssize_t length = read (STDIN_FILENO, input, sizeof (input));
  if (length < 0)
    {
      printf ("error: read: %m\n");
      abort ();
    }
  if (length < 2)
    return 0;

  size_t a_length = input[0];
  size_t b_length = input[1];
  if (2 + a_length + b_length > length)
    return 0;
  size_t c_length = length - 2 - a_length - b_length;

  char a[sizeof (input)];
  memcpy (a, input + 2, a_length);
  a[a_length] = '\0';

  char b[sizeof (input)];
  memcpy (b, input + 2 + a_length, b_length);
  b[b_length] = '\0';

  char c[sizeof (input)];
  memcpy (c, input + 2 + a_length + b_length, c_length);
  c[c_length] = '\0';

  char *array[count] = {a, b, c};
  qsort (array, count, sizeof (char *), cmp_for_qsort);
  int indices[count];
  {
    indices[0] = 0;
    for (int i = 1; i < count; ++i)
      if (verscmp_mine (array[i - 1], array[i]) == 0)
	indices[i] = indices[i - 1];
      else
	indices[i] = indices[i - 1] + 1;
  }

  compare (array, indices);
}

enum mystate { non_numeric, only_zero, zero_prefix, digits };

static enum mystate
char_to_state (char ch)
{
  if (ch == '0')
    return only_zero;
  else if ('1' <= ch && ch <= '9')
    return digits;
  else
    return non_numeric;
}

static enum mystate
join_state (enum mystate previous, enum mystate current)
{
  switch (previous)
    {
    case non_numeric:
      return current;
    case only_zero:
    case zero_prefix:
      switch (current)
	{
	case non_numeric:
	  return non_numeric;
	case only_zero:
	  return previous;
	case zero_prefix:
	case digits:
	  return zero_prefix;
	}
    case digits:
      switch (current)
        {
        case non_numeric:
          return non_numeric;
        case zero_prefix:
        case only_zero:
	case digits:
	  return digits;
        }
    }
  abort ();
}

static bool
positive (int ch)
{
  return '1' <= ch && ch <= '9';
}

static bool
myisdigit (int ch)
{
  return '0' <= ch && ch <= '9';
}

static size_t
digit_length (const char *start)
{
  const char *p;
  for (p = start; myisdigit (*p); ++p)
    ;
  return p - start;
}

static int
size_cmp (size_t left, size_t right)
{
  if (left < right)
    return -1;
  else if (left == right)
    return 0;
  else
    return 1;
}

static int
verscmp_mine (const char *left, const char *right)
{
  /* Skip the common prefix.  */
  enum mystate state = non_numeric;
  while (*left == *right)
    {
      if (*left == '\0')
        return 0;
      state = join_state (state, char_to_state (*left));
      ++left;
      ++right;
    }

  switch (state)
    {
    case non_numeric:
      DEBUG ("mystate: non_numeric\n");
      break;
    case zero_prefix:
      DEBUG ("mystate: zero_prefix\n");
      break;
    case only_zero:
      DEBUG ("mystate: only_zero\n");
      break;
    case digits:
      DEBUG ("mystate: digits\n");
      break;
    }
  
  switch (state)
    {
    case non_numeric:
      if (positive (*left) && positive (*right))
	{
	  int diff = size_cmp (digit_length (left), digit_length (right));
	  if (diff != 0)
	    return diff;
	}
    lex:
      {
        int l = *left & 0xff;
        int r = *right & 0xff;
        return l - r;
      }
    case zero_prefix:
      goto lex;
    case only_zero:
      {
	int diff = myisdigit (*left) - myisdigit (*right);
	if (diff != 0)
	  return -diff;
	goto lex;
      }
    case digits:
    num:
      {
	int diff = size_cmp (digit_length (left), digit_length (right));
	if (diff != 0)
	  return diff;
	goto lex;
      }
    }
  abort ();
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]