[PATCH] implicit Unicode data tables generation
Thomas Wolff
towo@towo.net
Tue Mar 13 18:24:00 GMT 2018
Am 13.03.2018 um 16:43 schrieb Corinna Vinschen:
> On Mar 13 14:56, Thomas Wolff wrote:
>> On 13.03.2018 11:46, Corinna Vinschen wrote:
>>> On Mar 12 20:32, Thomas Wolff wrote:
>>>> Am 12.03.2018 um 15:21 schrieb Corinna Vinschen:
>>>>> A build dependency to unicode-ucd should not be required. Why not have
>>>>> automatic build rules which are simply skipped if the file doesn't
>>>>> exist? So somebody can download the file and the rules do the rest?
>>>>> It's not *that* important, but a nice feature.
>>>> The easiest way would be that the mkunidata scripts just ignore the missing
>>>> files, like in the patch attached.
>>>> On the other hand, in my original scripts I had the additional fallback
>>>> option to download them from unicode.org on demand.
>>>> If that's acceptable in a build process, I could add that back in.
>>>> Or just drop those rules and stay explicit?
>>> Implicit would be nice, it's not actually a hard requirement.
>>>
>>> We must not rely on system-installed files outside the source tree,
>>> build tree or outside the toolchain.
>> That's why my patch yesterday would ignore missing of the files.
>>> Please provide a patch removing any ln to files under /usr/share, etc.
>> ...
> ...
> You can also never make assumptions about the age of a file provided by
> your build OS. There's a good chance the OS is providing an older
> Unicode data file version than the one you actually want to build the
> dependent files from. Worst case, older than the currently supported
> Unicode version.
>
> The bottom line is, the Unicode data file has been either downloaded into
> the expected location in libc/ctype, or it's not available.
>
>>> The explicit rules add a rule to download the Unicode.txt file into the
>>> src tree. That's the only file we may rely on. If it doesn't exist,
>>> an implicit rule should not break the build.
>> Well, my mentioned patch wouldn't break it anymore.
Actually it was broken and would still abort if unicode-ucd isn't installed.
And the contributed version of string/mkunidata is broken, too; it does
nothing because it contains an "exit" as a test artefact.
Sorry for this embarassing incident.
>> I can tweak the script further to your preference, ...
The attached patch should be sufficient for various requirements. See
extended commit description.
Thomas
-------------- next part --------------
From 4f7df495bad81fffa858c4f27c5fbd709e40b7b3 Mon Sep 17 00:00:00 2001
From: Thomas Wolff <mintty@users.noreply.github.com>
Date: Tue, 13 Mar 2018 18:26:19 +0100
Subject: [PATCH] fix/enhance Unicode table generation scripts
Scripts do not try to acquire Unicode data by best-effort magic anymore.
Options supported:
-h for help
-i to copy Unicode data from /usr/share/unicode/ucd first
-u to download Unicode data from unicode.org first
If (despite of -i or -u if given) the necessary Unicode files are not
available locally, table generation is skipped, but no error code is
returned, so not to obstruct the build process if called from a Makefile.
---
newlib/libc/ctype/mkunidata | 34 ++++++++++++++++++++++++++--------
newlib/libc/string/mkunidata | 37 +++++++++++++++++++++++++++----------
2 files changed, 53 insertions(+), 18 deletions(-)
diff --git a/newlib/libc/ctype/mkunidata b/newlib/libc/ctype/mkunidata
index ea18e67..4bdf3bc 100755
--- a/newlib/libc/ctype/mkunidata
+++ b/newlib/libc/ctype/mkunidata
@@ -1,6 +1,6 @@
#! /bin/sh
-echo generating Unicode character properties data for newlib/libc/ctype
+echo Generating Unicode character properties data for newlib/libc/ctype
cd `dirname $0`
@@ -8,23 +8,41 @@ cd `dirname $0`
# checks and (with option -u) download
case "$1" in
+-h) echo "Usage: $0 [-h|-u|-i]"
+ echo "Generate case conversion table caseconv.t and character category table categories.t"
+ echo "from local Unicode file UnicodeData.txt."
+ echo ""
+ echo "Options:"
+ echo " -u download file from unicode.org first"
+ echo " -i copy file from /usr/share/unicode/ucd first"
+ echo " -h show this"
+ exit
+ ;;
-u)
- #WGET=wget -N -t 1 --timeout=55
- WGET=curl -R -O --connect-timeout 55
- WGET+=-z $@
+ wget () {
+ curl -R -O --connect-timeout 55 -z "`basename $1`" "$1"
+ }
echo downloading data from unicode.org
for data in UnicodeData.txt
- do $WGET http://unicode.org/Public/UNIDATA/$data
+ do wget http://unicode.org/Public/UNIDATA/$data
done
;;
-*) echo checking package unicode-ucd
- grep unicode-ucd /etc/setup/installed.db || exit 9
+-i)
+ echo copying data from /usr/share/unicode/ucd
+ for data in UnicodeData.txt
+ do cp /usr/share/unicode/ucd/$data .
+ done
;;
esac
+echo checking Unicode data file
for data in UnicodeData.txt
-do test -r $data || ln -s /usr/share/unicode/ucd/$data . || exit 9
+do if [ -r $data ]
+ then true
+ else echo $data not available, skipping table generation
+ exit
+ fi
done
#############################################################################
diff --git a/newlib/libc/string/mkunidata b/newlib/libc/string/mkunidata
index c0bf5de..7ebebeb 100755
--- a/newlib/libc/string/mkunidata
+++ b/newlib/libc/string/mkunidata
@@ -1,6 +1,6 @@
#! /bin/sh
-echo generating Unicode width data for newlib/libc/string/wcwidth.c
+echo Generating Unicode width data for newlib/libc/string/wcwidth.c
cd `dirname $0`
PATH="$PATH":. # ensure access to uniset tool
@@ -9,34 +9,51 @@ PATH="$PATH":. # ensure access to uniset tool
# checks and (with option -u) downloads
case "$1" in
+-h) echo "Usage: $0 [-h|-u|-i]"
+ echo "Generate width data tables ambiguous.t, combining.t, wide.t"
+ echo "from local Unicode files UnicodeData.txt, Blocks.txt, EastAsianWidth.txt."
+ echo ""
+ echo "Options:"
+ echo " -u download files from unicode.org first, download uniset tool"
+ echo " -i copy files from /usr/share/unicode/ucd first"
+ echo " -h show this"
+ exit
+ ;;
-u)
- #WGET=wget -N -t 1 --timeout=55
- WGET=curl -R -O --connect-timeout 55
- WGET+=-z $@
+ wget () {
+ curl -R -O --connect-timeout 55 -z "`basename $1`" "$1"
+ }
echo downloading uniset tool
- $WGET http://www.cl.cam.ac.uk/~mgk25/download/uniset.tar.gz
+ wget http://www.cl.cam.ac.uk/~mgk25/download/uniset.tar.gz
gzip -dc uniset.tar.gz | tar xvf - uniset
echo downloading data from unicode.org
for data in UnicodeData.txt Blocks.txt EastAsianWidth.txt
- do $WGET http://unicode.org/Public/UNIDATA/$data
+ do wget http://unicode.org/Public/UNIDATA/$data
done
;;
-*) echo checking package unicode-ucd
- grep unicode-ucd /etc/setup/installed.db || exit 9
+-i)
+ echo copying data from /usr/share/unicode/ucd
+ for data in UnicodeData.txt Blocks.txt EastAsianWidth.txt
+ do cp /usr/share/unicode/ucd/$data .
+ done
;;
esac
echo checking uniset tool
type uniset || exit 9
+echo checking Unicode data files
for data in UnicodeData.txt Blocks.txt EastAsianWidth.txt
-do test -r $data || ln -s /usr/share/unicode/ucd/$data . || exit 9
+do if [ -r $data ]
+ then true
+ else echo $data not available, skipping table generation
+ exit
+ fi
done
echo generating from Unicode version `sed -e 's,[^.0-9],,g' -e 1q Blocks.txt`
-exit
#############################################################################
# table generation
--
2.16.2
More information about the Newlib
mailing list