[PATCH] implicit Unicode data tables generation

Tue Mar 13 18:24:00 GMT 2018

Am 13.03.2018 um 16:43 schrieb Corinna Vinschen:
> On Mar 13 14:56, Thomas Wolff wrote:
>> On 13.03.2018 11:46, Corinna Vinschen wrote:
>>> On Mar 12 20:32, Thomas Wolff wrote:
>>>> Am 12.03.2018 um 15:21 schrieb Corinna Vinschen:
>>>>> A build dependency to unicode-ucd should not be required.  Why not have
>>>>> automatic build rules which are simply skipped if the file doesn't
>>>>> exist?  So somebody can download the file and the rules do the rest?
>>>>> It's not *that* important, but a nice feature.
>>>> The easiest way would be that the mkunidata scripts just ignore the missing
>>>> files, like in the patch attached.
>>>> On the other hand, in my original scripts I had the additional fallback
>>>> option to download them from unicode.org on demand.
>>>> If that's acceptable in a build process, I could add that back in.
>>>> Or just drop those rules and stay explicit?
>>> Implicit would be nice, it's not actually a hard requirement.
>>>
>>> We must not rely on system-installed files outside the source tree,
>>> build tree or outside the toolchain.
>> That's why my patch yesterday would ignore missing of the files.
>>> Please provide a patch removing any ln to files under /usr/share, etc.
>> ...
> ...
> You can also never make assumptions about the age of a file provided by
> your build OS.  There's a good chance the OS is providing an older
> Unicode data file version than the one you actually want to build the
> dependent files from.  Worst case, older than the currently supported
> Unicode version.
>
> The bottom line is, the Unicode data file has been either downloaded into
> the expected location in libc/ctype, or it's not available.
>
>>> The explicit rules add a rule to download the Unicode.txt file into the
>>> src tree.  That's the only file we may rely on.  If it doesn't exist,
>>> an implicit rule should not break the build.
>> Well, my mentioned patch wouldn't break it anymore.
Actually it was broken and would still abort if unicode-ucd isn't installed.
And the contributed version of string/mkunidata is broken, too; it does 
nothing because it contains an "exit" as a test artefact.
Sorry for this embarassing incident.

>> I can tweak the script further to your preference, ...
The attached patch should be sufficient for various requirements. See 
extended commit description.
Thomas
-------------- next part --------------
From 4f7df495bad81fffa858c4f27c5fbd709e40b7b3 Mon Sep 17 00:00:00 2001
From: Thomas Wolff <mintty@users.noreply.github.com>
Date: Tue, 13 Mar 2018 18:26:19 +0100
Subject: [PATCH] fix/enhance Unicode table generation scripts

Scripts do not try to acquire Unicode data by best-effort magic anymore.
Options supported:
-h for help
-i to copy Unicode data from /usr/share/unicode/ucd first
-u to download Unicode data from unicode.org first
If (despite of -i or -u if given) the necessary Unicode files are not
available locally, table generation is skipped, but no error code is
returned, so not to obstruct the build process if called from a Makefile.
---
 newlib/libc/ctype/mkunidata  | 34 ++++++++++++++++++++++++++--------
 newlib/libc/string/mkunidata | 37 +++++++++++++++++++++++++++----------
 2 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/newlib/libc/ctype/mkunidata b/newlib/libc/ctype/mkunidata
index ea18e67..4bdf3bc 100755
--- a/newlib/libc/ctype/mkunidata
+++ b/newlib/libc/ctype/mkunidata
@@ -1,6 +1,6 @@
 #! /bin/sh
 
-echo generating Unicode character properties data for newlib/libc/ctype
+echo Generating Unicode character properties data for newlib/libc/ctype
 
 cd `dirname $0`
 
@@ -8,23 +8,41 @@ cd `dirname $0`
 # checks and (with option -u) download
 
 case "$1" in
+-h)	echo "Usage: $0 [-h|-u|-i]"
+	echo "Generate case conversion table caseconv.t and character category table categories.t"
+	echo "from local Unicode file UnicodeData.txt."
+	echo ""
+	echo "Options:"
+	echo "  -u    download file from unicode.org first"
+	echo "  -i    copy file from /usr/share/unicode/ucd first"
+	echo "  -h    show this"
+	exit
+	;;
 -u)
-	#WGET=wget -N -t 1 --timeout=55
-	WGET=curl -R -O --connect-timeout 55
-	WGET+=-z $@
+	wget () {
+		curl -R -O --connect-timeout 55 -z "`basename $1`" "$1"
+	}
 
 	echo downloading data from unicode.org
 	for data in UnicodeData.txt
-	do	$WGET http://unicode.org/Public/UNIDATA/$data
+	do	wget http://unicode.org/Public/UNIDATA/$data
 	done
 	;;
-*)	echo checking package unicode-ucd
-	grep unicode-ucd /etc/setup/installed.db || exit 9
+-i)
+	echo copying data from /usr/share/unicode/ucd
+	for data in UnicodeData.txt
+	do	cp /usr/share/unicode/ucd/$data .
+	done
 	;;
 esac
 
+echo checking Unicode data file
 for data in UnicodeData.txt
-do	test -r $data || ln -s /usr/share/unicode/ucd/$data . || exit 9
+do	if [ -r $data ]
+	then	true
+	else	echo $data not available, skipping table generation
+		exit
+	fi
 done
 
 #############################################################################
diff --git a/newlib/libc/string/mkunidata b/newlib/libc/string/mkunidata
index c0bf5de..7ebebeb 100755
--- a/newlib/libc/string/mkunidata
+++ b/newlib/libc/string/mkunidata
@@ -1,6 +1,6 @@
 #! /bin/sh
 
-echo generating Unicode width data for newlib/libc/string/wcwidth.c
+echo Generating Unicode width data for newlib/libc/string/wcwidth.c
 
 cd `dirname $0`
 PATH="$PATH":.	# ensure access to uniset tool
@@ -9,34 +9,51 @@ PATH="$PATH":.	# ensure access to uniset tool
 # checks and (with option -u) downloads
 
 case "$1" in
+-h)	echo "Usage: $0 [-h|-u|-i]"
+	echo "Generate width data tables ambiguous.t, combining.t, wide.t"
+	echo "from local Unicode files UnicodeData.txt, Blocks.txt, EastAsianWidth.txt."
+	echo ""
+	echo "Options:"
+	echo "  -u    download files from unicode.org first, download uniset tool"
+	echo "  -i    copy files from /usr/share/unicode/ucd first"
+	echo "  -h    show this"
+	exit
+	;;
 -u)
-	#WGET=wget -N -t 1 --timeout=55
-	WGET=curl -R -O --connect-timeout 55
-	WGET+=-z $@
+	wget () {
+		curl -R -O --connect-timeout 55 -z "`basename $1`" "$1"
+	}
 
 	echo downloading uniset tool
-	$WGET http://www.cl.cam.ac.uk/~mgk25/download/uniset.tar.gz
+	wget http://www.cl.cam.ac.uk/~mgk25/download/uniset.tar.gz
 	gzip -dc uniset.tar.gz | tar xvf - uniset
 
 	echo downloading data from unicode.org
 	for data in UnicodeData.txt Blocks.txt EastAsianWidth.txt
-	do	$WGET http://unicode.org/Public/UNIDATA/$data
+	do	wget http://unicode.org/Public/UNIDATA/$data
 	done
 	;;
-*)	echo checking package unicode-ucd
-	grep unicode-ucd /etc/setup/installed.db || exit 9
+-i)
+	echo copying data from /usr/share/unicode/ucd
+	for data in UnicodeData.txt Blocks.txt EastAsianWidth.txt
+	do	cp /usr/share/unicode/ucd/$data .
+	done
 	;;
 esac
 
 echo checking uniset tool
 type uniset || exit 9
 
+echo checking Unicode data files
 for data in UnicodeData.txt Blocks.txt EastAsianWidth.txt
-do	test -r $data || ln -s /usr/share/unicode/ucd/$data . || exit 9
+do	if [ -r $data ]
+	then	true
+	else	echo $data not available, skipping table generation
+		exit
+	fi
 done
 
 echo generating from Unicode version `sed -e 's,[^.0-9],,g' -e 1q Blocks.txt`
-exit
 
 #############################################################################
 # table generation
-- 
2.16.2