[PATCH 4/6] generated character data for libc/ctype

Thomas Wolff towo@towo.net
Mon Mar 26 09:53:00 GMT 2018


Am 26.03.2018 um 10:01 schrieb Corinna Vinschen:
> On Mar 25 11:02, Thomas Wolff wrote:
>> Am 23.03.2018 um 22:02 schrieb Thomas Wolff:
>>> Am 23.03.2018 um 20:43 schrieb Corinna Vinschen:
>>>> On Mar 23 20:28, Thomas Wolff wrote:
>>>>> While meditating, I noticed that the bit packing of the case conversion
>>>>> entries could use some documentation.
>>>>> The attached patch adds that (and some tweaking for consistent
>>>>> indentation);
>>>>> no code changes.
>>>>> Thomas
>>>>>   From f8f4784437d319ad3ac2e3c629335fd0f50bee69 Mon Sep 17 00:00:00 2001
>>>>> From: Thomas Wolff <towo@towo.net>
>>>>> Date: Fri, 23 Mar 2018 20:07:22 +0100
>>>>> Subject: [PATCH] comments to document struct caseconv_entry
>>>>>
>>>>> explain design of compact (packed) struct caseconv_entry,
>>>>> in case it needs to be modified for future Unicode versions
>>>> ...
>>> ... we can reduce the patch to the documentation, of course.
>> as attached
>> Thomas
> Thanks, but the patch is broken.  The last line in the patch is the
> start of another patch hunk, which then is missing.  Can you fix that, please?
Yeah, I tried to limit git fiddling effort by manually manipulating the 
patch, which failed.
(After I tried to re-sync with the current repository, it would insist 
on some merging, and I do not know how to rectify that;
manual fixing of the file, git pull -f... nothing helped (error: Pulling 
is not possible because you have unmerged files); I know I should 
eventually consult the howto you kindly pointed me to...)
So, based on a fresh git clone, here's an updated patch, also fixing one 
remaining minor layout glitch.
Thomas
-------------- next part --------------
From 07fa10556a8fa0ecaf402268244abfdd25f6325c Mon Sep 17 00:00:00 2001
From: Thomas Wolff <towo@towo.net>
Date: Mon, 26 Mar 2018 11:46:40 +0200
Subject: [PATCH] comments to document struct caseconv_entry

explain design of compact (packed) struct caseconv_entry,
in case it needs to be modified for future Unicode versions
---
 newlib/libc/ctype/towctrans_l.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/newlib/libc/ctype/towctrans_l.c b/newlib/libc/ctype/towctrans_l.c
index ca7e89f..9759cf7 100644
--- a/newlib/libc/ctype/towctrans_l.c
+++ b/newlib/libc/ctype/towctrans_l.c
@@ -4,8 +4,36 @@
 //#include <errno.h>
 #include "local.h"
 
-enum {EVENCAP, ODDCAP};
+/*
+   struct caseconv_entry describes the case conversion behaviour
+   of a range of Unicode characters.
+   It was designed to be compact for a minimal table size.
+   The range is first...first + diff.
+   Conversion behaviour for a character c in the respective range:
+     mode == TOLO	towlower (c) = c + delta
+     mode == TOUP	towupper (c) = c + delta
+     mode == TOBOTH	(titling case characters)
+			towlower (c) = c + 1
+			towupper (c) = c - 1
+     mode == TO1	capital/small letters are alternating
+	delta == EVENCAP	even codes are capital
+	delta == ODDCAP		odd codes are capital
+			(this correlates with an even/odd first range value
+			as of Unicode 10.0 but we do not rely on this)
+   As of Unicode 10.0, the following field lengths are sufficient
+	first: 17 bits
+	diff: 8 bits
+	delta: 17 bits
+	mode: 2 bits
+   The reserve of 4 bits (to limit the struct to 6 bytes)
+   is currently added to the 'first' field;
+   should a future Unicode version make it necessary to expand the others,
+   the 'first' field could be reduced as needed, or larger ranges could
+   be split up (reduce limit max=255 e.g. to max=127 or max=63 in 
+   script mkcaseconv, check increasing table size).
+ */
 enum {TO1, TOLO, TOUP, TOBOTH};
+enum {EVENCAP, ODDCAP};
 static struct caseconv_entry {
   unsigned int first: 21;
   unsigned short diff: 8;
@@ -71,6 +99,7 @@ toulower (wint_t c)
 	default:
 	  break;
       }
+
   return c;
 }
 
@@ -102,9 +131,10 @@ touupper (wint_t c)
 	  default:
 	    break;
 	  }
-	default:
-	  break;
+      default:
+	break;
       }
+
   return c;
 }
 
-- 
2.16.2



More information about the Newlib mailing list