[PATCH/RFA] Extended wctomb/mbtowc conversion and more stuff
Jeff Johnston
jjohnstn@redhat.com
Mon Mar 23 17:52:00 GMT 2009
Corinna Vinschen wrote:
> Ok,
>
> this is the new patch about the extended wctomb_r/mbtowc_r stuff.
>
> It got more complicated because of various requirements in Cygwin.
> One of them is the requirement to be able to call mbtowc for a charset
> other than the current locale charset.
>
> I guess the best I can do is to start to explain what this patch is
> doing and explain the details while going aloing with the flow.
>
> - Set the default chrset to "ASCII", rather than ISO-8859-1.
>
> This change has two reasons. First of all, POSIX requires that
> the default setting for all applications which don't explicitely
> call setlocale is the "POSIX" or "C" locale. In this locale,
> only ASCII characters are supported. This is also (correctly) the case
> in the ctype functions in newlib. Only the charset is wrongly
> set to "ISO-8859-1". Wrong in POSIX terms, and wrong because it's
> not really supported by default.
>
> - Add support for correct ISO-8859-x multibyte<->wide char conversion.
>
> - If the input to setlocale is "C" or "POSIX", set the charset to
> "ASCII" now.
>
> - Add support for all default ANSI and OEM codepages used on Windows,
> CP437, CP720, CP737, CP775, CP850, CP852, CP855, CP857, CP858, CP862,
> CP866, CP874, CP1125, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255,
> CP1256, CP1257, CP1258.
>
> This new charset support require a couple of new character conversion
> tables which I put into a new file called libc/stdlib/sb_charsets.c,
> and which are only built on _MB_CAPABLE systems. The tables are now
> guarded by the defines we talked about, _MB_EXTENDED_CHARSETS_ISO and
> _MB_EXTENDED_CHARSETS_DOS. Maybe the latter should be better renamed
> to _MB_EXTENDED_CHARSETS_WINDOWS, though.
>
> - On Cygwin, add support for the charsets GBK, CP949 (Korean unified Hangul),
> and BIG5. My current implementation of these charset conversion requires
> OS support, so Cygwin needs to be able to set them in setlocale(), but
> I have no implementation for newlib so far.
>
> - On Cygwin, if no explicit charset is defined as input to setlocale,
> search for the current ANSI codepage and set it as current charset,
> if it's one of the supported charsets, otherwise default to ISO-8859-1.
>
> The change to the former patch is that the function
> __set_charset_from_codepage is now defined in Cygwin, not in newlib.
>
> - Also on Cygwin, call a function __set_ctype, also defined in Cygwin only
> for now. This allows to switch the ctype tables for the various charsets.
>
> The idea is that this function can also be defined in newlib at one
> point. We just have to discuss the implementation. In Cygwin the
> ctype data is copied over into the standard ctype array. This is the
> only way to do it which allows backward compatible behaviour with
> existing applications due to the nature of the isXXX functions being
> mostly used as macros defined in ctype.h.
>
> - Allow "eucJP" additionally to "EUCJP", and "Big5" additionally to "BIG5",
> to support typical settings of these charsets on other systems.
>
> - The functions _wctomb_r and _mbtowc_r are now split into multiple
> functions for each supported charset, rather than having to call
> strcmp multiple times to determine which charset is used.
>
> To do that, the setlocale() function sets function pointers
> __wctomb/__mbtowc according to the current charset. On systems not
> being _MB_CAPABLE, only two such functions exist, __ascii_wctomb and
> __ascii_mbtowc.'
>
> The change in contrast to the former implementation is that the charset
> is one of the parameters to these functions. That's necessary to
> allow Cygwin to call the __iso_mbtowc and __cp_mbtowc functions with
> an alternate charset.
>
> - On Cygwin, don't use the newlib implementation of SJIS, JIS, and EUCJP
> mbtowc/wctomb. The reason is that newlib's implementations don't
> convert the input multibyte chars to UTF wchars, rather it converts
> them to a simple self-made form of wchars. This doesn't work well
> on Cygwin, because the underlying OS always requires wchars to be UTF-16.
> Therefore Cygwin has it's own implementations of __sjis_mbtowc, etc.
>
> - Along the same lines, the function __jp2uc now does not convert the
> incoming character at all on Cygwin, because the incoming char is
> already UTF on Cygwin.
>
> - All iswXXX and towXXX functions have been changed so that on
> _MB_CAPABLE systems all wchar_t input is either SJIS/JIS/EUCP, which
> requires to convert the character to unicode first, or the input is
> already unicode. This is the wchar_t representation for all other
> charsets anyway, and the only wchar_t representation on Cygwin as
> outlined above.
>
> - The _MB_EXTENDED_CHARSETS_ISO and _MB_EXTENDED_CHARSETS_DOS are
> defined in libc/include/sys/config.h. I also added a define
> _MB_EXTENDED_CHARSETS_ALL which is right now only set on Cygwin.
> It enables the other two, and I expect them to enable the still
> missing _MB_EXTENDED_CHARSETS_GBK, _MB_EXTENDED_CHARSETS_KOR,
> and _MB_EXTENDED_CHARSETS_BIG5, as soon as they are available.
>
> - In libc/include/sys/reent.h, I marked the struct _reent members
> _current_category and _current_locale as unused. They are, because
> they were only (incorrectly) used by the old setlocale implementation.
> I don't want to remove them to keep the size of struct _reent the
> same for backward compatibility with existing code.
>
> Again, the patch is split in two. The first one containing all changes
> except those in ctype, the second one containg the ctype changes.
>
> I have a rather big patch to Cygwin which requires this functionality
> to go in first. I hope the patch is basically ok to apply.
>
> I have split up the long ChangeLog entry for better readability.
>
>
Please put the _mbtowc_r and _wctomb_r functions at the top of the files
plus the default ASCII
versions so people don't have to wade through to the bottom. I don't
think the change of the default
charset name is going to affect anybody. I am ok with you checking in
the patch.
-- Jeff J.
> Corinna
>
>
> * libc/ctype/iswalpha.c: Handle all wchar_t as unicode on
> _MB_CAPABLE systems.
> * libc/ctype/iswblank.c: Ditto.
> * libc/ctype/iswcntrl.c: Ditto.
> * libc/ctype/iswprint.c: Ditto.
> * libc/ctype/iswpunct.c: Ditto.
> * libc/ctype/iswspace.c: Ditto.
> * libc/ctype/jp2uc.c (__jp2uc): On Cygwin, just return c.
> Explain why.
> * libc/ctype/towlower.c: Ditto.
> * libc/ctype/towupper.c: Ditto.
>
> * libc/include/sys/config.h: Define _MB_EXTENDED_CHARSETS_ISO
> and _MB_EXTENDED_CHARSETS_DOS if _MB_EXTENDED_CHARSETS_ALL is
> defined. Define _MB_EXTENDED_CHARSETS_ALL on Cygwin only for now.
> * libc/include/sys/reent.h (struct _reent): Mark _current_category
> and _current_locale as unused.
>
> * libc/locale/locale.c: Add new charset support to documentation.
> Include ../stdio/local.h from here.
> (lc_ctype_charset): Set to "ASCII" by default.
> (lc_message_charset): Ditto.
> (_setlocale_r): Don't set _current_category and _current_locale.
> (loadlocale): Add Cygwin codepage support. On _MB_CAPABLE
> systems, set __mbtowc and __wctomb function pointers to function
> corresponding with current charset. Don't allow non-existant
> ISO-8859-12 charset. Add support for Windows singlebyte codepages.
> On Cygwin, add support for GBK, CP949, and BIG5. On Cygwin,
> call __set_ctype() in case the catorgy is LC_CTYPE. Don't set
> _current_category and _current_locale.
>
> * libc/stdlib/Makefile.am (GENERAL_SOURCES): Add sb_charsets.c.
> * libc/stdlib/Makefile.in: Regenerate.
> * libc/stdlib/local.h: Add prototype for __locale_charset.
> Add prototypes for __mbtowc and __wctomb pointers.
> Add prototypes for charset-specific _wctomb_r and _mbtowc_r
> functions.
> Declare tables and functions from sb_charsets.c.
> * libc/stdlib/mbtowc_r.c (__mbtowc): Define. Set to __ascii_mbtowc
> by default.
> (__iso_mbtowc): New function.
> (__cp_mbtowc): New function.
> (__utf8_mbtowc): New function.
> (__sjis_mbtowc): New function. Disable on Cygwin.
> (__eucjp_mbtowc): New function. Disable on Cygwin.
> (__jis_mbtowc): New function. Disable on Cygwin.
> (__ascii_mbtowc): New function.
> (_mbtowc_r): Just call __mbtowc from here.
> * libc/stdlib/sb_charsets.c: New file, adding singlebyte to UTF
> conversion tables for all ISO and CP charsets.
> (__iso_8859_index): New function.
> (__cp_index): New function.
> * libc/stdlib/wctomb_r.c (__wctomb): Define. Set to __ascii_wctomb
> by default.
> (__utf8_wctomb): New function.
> (__sjis_wctomb): New function. Disable on Cygwin.
> (__eucjp_wctomb): New function. Disable on Cygwin.
> (__jis_wctomb): New function. Disable on Cygwin.
> (__iso_wctomb): New function.
> (__cp_wctomb): New function.
> (__ascii_wctomb): New function.
> (_wctomb_r): Just call __wctomb from here.
>
>
> Index: libc/include/sys/config.h
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/include/sys/config.h,v
> retrieving revision 1.50
> diff -u -p -r1.50 config.h
> --- libc/include/sys/config.h 20 Mar 2009 20:44:14 -0000 1.50
> +++ libc/include/sys/config.h 22 Mar 2009 16:25:07 -0000
> @@ -179,6 +179,7 @@
> #if defined(__CYGWIN__)
> #include <cygwin/config.h>
> #define __LINUX_ERRNO_EXTENSIONS__ 1
> +#define _MB_EXTENDED_CHARSETS_ALL 1
> #endif
>
> #if defined(__rtems__)
> @@ -211,4 +212,12 @@
> #endif
> #endif
>
> +/* If _MB_EXTENDED_CHARSETS_ALL is set, we want all of the extended
> + charsets. The extended charsets add a few functions and a couple
> + of tables of a few K each. */
> +#ifdef _MB_EXTENDED_CHARSETS_ALL
> +#define _MB_EXTENDED_CHARSETS_ISO 1
> +#define _MB_EXTENDED_CHARSETS_DOS 1
> +#endif
> +
> #endif /* __SYS_CONFIG_H__ */
> Index: libc/include/sys/reent.h
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/include/sys/reent.h,v
> retrieving revision 1.45
> diff -u -p -r1.45 reent.h
> --- libc/include/sys/reent.h 10 Dec 2008 23:43:12 -0000 1.45
> +++ libc/include/sys/reent.h 22 Mar 2009 16:25:07 -0000
> @@ -371,8 +371,8 @@ struct _reent
>
> int __sdidinit; /* 1 means stdio has been init'd */
>
> - int _current_category; /* used by setlocale */
> - _CONST char *_current_locale;
> + int _current_category; /* unused */
> + _CONST char *_current_locale; /* unused */
>
> struct _mprec *_mp;
>
> Index: libc/locale/locale.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/locale/locale.c,v
> retrieving revision 1.9
> diff -u -p -r1.9 locale.c
> --- libc/locale/locale.c 3 Mar 2009 09:28:45 -0000 1.9
> +++ libc/locale/locale.c 22 Mar 2009 16:25:07 -0000
> @@ -47,11 +47,18 @@ and <<"C">> values for <[locale]>; strin
> honored unless _MB_CAPABLE is defined in which case POSIX locale strings
> are allowed, plus five extensions supported for backward compatibility with
> older implementations using newlib: <<"C-UTF-8">>, <<"C-JIS">>, <<"C-EUCJP">>,
> -<<"C-SJIS">>, or <<"C-ISO-8859-x">> with 1 <= x <= 15. Even when using
> -POSIX locale strings, the only charsets allowed are <<"UTF-8">>, <<"JIS">>,
> -<<"EUCJP">>, <<"SJIS">>, or <<"ISO-8859-x">> with 1 <= x <= 15. (<<"">> is
> -also accepted; if given, the settings are read from the corresponding
> -LC_* environment variables and $LANG according to POSIX rules.
> +<<"C-SJIS">>, <<"C-ISO-8859-x">> with 1 <= x <= 15, or <<"C-CPxxx">> with
> +xxx in [437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866, 874, 1125, 1250,
> +1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258]. Even when using POSIX
> +locale strings, the only charsets allowed are <<"UTF-8">>, <<"JIS">>,
> +<<"EUCJP">>, <<"SJIS">>, <<"ISO-8859-x">> with 1 <= x <= 15, or
> +<<"CPxxx">> with xxx in [437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866,
> +874, 1125, 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258].
> +(<<"">> is also accepted; if given, the settings are read from the
> +corresponding LC_* environment variables and $LANG according to POSIX rules.
> +
> +Under Cygwin, this implementation additionally supports the charsets <<"GBK">>,
> +<<"CP949">>, and <<"BIG5">>.
>
> If you use <<NULL>> as the <[locale]> argument, <<setlocale>> returns
> a pointer to the string representing the current locale (always
> @@ -85,6 +92,9 @@ PORTABILITY
> ANSI C requires <<setlocale>>, but the only locale required across all
> implementations is the C locale.
>
> +NOTES
> +There is no ISO-8859-12 codepage. It's also refused by this implementation.
> +
> No supporting OS subroutines are required.
> */
>
> @@ -129,6 +139,11 @@ No supporting OS subroutines are require
> #include <limits.h>
> #include <reent.h>
> #include <stdlib.h>
> +#include <wchar.h>
> +#include "../stdlib/local.h"
> +#ifdef __CYGWIN__
> +#include <windows.h>
> +#endif
>
> #define _LC_LAST 7
> #define ENCODING_LEN 31
> @@ -190,8 +205,8 @@ static const char *__get_locale_env(stru
>
> #endif
>
> -static char lc_ctype_charset[ENCODING_LEN + 1] = "ISO-8859-1";
> -static char lc_message_charset[ENCODING_LEN + 1] = "ISO-8859-1";
> +static char lc_ctype_charset[ENCODING_LEN + 1] = "ASCII";
> +static char lc_message_charset[ENCODING_LEN + 1] = "ASCII";
>
> char *
> _DEFUN(_setlocale_r, (p, category, locale),
> @@ -205,8 +220,6 @@ _DEFUN(_setlocale_r, (p, category, local
> if (strcmp (locale, "POSIX") && strcmp (locale, "C")
> && strcmp (locale, ""))
> return NULL;
> - p->_current_category = category;
> - p->_current_locale = locale;
> }
> return "C";
> #else
> @@ -361,6 +374,11 @@ currentlocale()
> #endif
>
> #ifdef _MB_CAPABLE
> +#ifdef __CYGWIN__
> +extern void *__set_charset_from_codepage (unsigned int, char *charset);
> +extern void __set_ctype (const char *charset);
> +#endif /* __CYGWIN__ */
> +
> static char *
> loadlocale(struct _reent *p, int category)
> {
> @@ -382,7 +400,7 @@ loadlocale(struct _reent *p, int categor
> if (!strcmp (locale, "POSIX"))
> strcpy (locale, "C");
> if (!strcmp (locale, "C")) /* Default "C" locale */
> - strcpy (charset, "ISO-8859-1");
> + strcpy (charset, "ASCII");
> else if (locale[0] == 'C' && locale[1] == '-') /* Old newlib style */
> strcpy (charset, locale + 2);
> else /* POSIX style */
> @@ -414,7 +432,11 @@ loadlocale(struct _reent *p, int categor
> }
> else if (c[0] == '\0' || c[0] == '@')
> /* End of string or just a modifier */
> +#ifdef __CYGWIN__
> + __set_charset_from_codepage (GetACP (), charset);
> +#else
> strcpy (charset, "ISO-8859-1");
> +#endif
> else
> /* Invalid string */
> return NULL;
> @@ -426,42 +448,155 @@ loadlocale(struct _reent *p, int categor
> if (strcmp (charset, "UTF-8"))
> return NULL;
> mbc_max = 6;
> +#ifdef _MB_CAPABLE
> + __wctomb = __utf8_wctomb;
> + __mbtowc = __utf8_mbtowc;
> +#endif
> break;
> case 'J':
> if (strcmp (charset, "JIS"))
> return NULL;
> mbc_max = 8;
> +#ifdef _MB_CAPABLE
> + __wctomb = __jis_wctomb;
> + __mbtowc = __jis_mbtowc;
> +#endif
> break;
> case 'E':
> - if (strcmp (charset, "EUCJP"))
> + if (strcmp (charset, "EUCJP") && strcmp (charset, "eucJP"))
> return NULL;
> + strcpy (charset, "EUCJP");
> mbc_max = 2;
> +#ifdef _MB_CAPABLE
> + __wctomb = __eucjp_wctomb;
> + __mbtowc = __eucjp_mbtowc;
> +#endif
> break;
> case 'S':
> if (strcmp (charset, "SJIS"))
> return NULL;
> mbc_max = 2;
> +#ifdef _MB_CAPABLE
> + __wctomb = __sjis_wctomb;
> + __mbtowc = __sjis_mbtowc;
> +#endif
> break;
> case 'I':
> - default:
> - /* Must be exactly one of ISO-8859-1, [...] ISO-8859-15. */
> + /* Must be exactly one of ISO-8859-1, [...] ISO-8859-16, except for
> + ISO-8859-12. */
> if (strncmp (charset, "ISO-8859-", 9))
> return NULL;
> - val = strtol (charset + 9, &end, 10);
> - if (val < 1 || val > 15 || *end)
> + val = _strtol_r (p, charset + 9, &end, 10);
> + if (val < 1 || val > 16 || val == 12 || *end)
> + return NULL;
> + mbc_max = 1;
> +#ifdef _MB_CAPABLE
> +#ifdef _MB_EXTENDED_CHARSETS_ISO
> + __wctomb = __iso_wctomb;
> + __mbtowc = __iso_mbtowc;
> +#else /* !_MB_EXTENDED_CHARSETS_ISO */
> + __wctomb = __ascii_wctomb;
> + __mbtowc = __ascii_mbtowc;
> +#endif /* _MB_EXTENDED_CHARSETS_ISO */
> +#endif
> + break;
> + case 'C':
> + if (charset[1] != 'P')
> + return NULL;
> + val = _strtol_r (p, charset + 2, &end, 10);
> + if (*end)
> + return NULL;
> + switch (val)
> + {
> + case 437:
> + case 720:
> + case 737:
> + case 775:
> + case 850:
> + case 852:
> + case 855:
> + case 857:
> + case 858:
> + case 862:
> + case 866:
> + case 874:
> + case 1125:
> + case 1250:
> + case 1251:
> + case 1252:
> + case 1253:
> + case 1254:
> + case 1255:
> + case 1256:
> + case 1257:
> + case 1258:
> + mbc_max = 1;
> +#ifdef _MB_CAPABLE
> +#ifdef _MB_EXTENDED_CHARSETS_DOS
> + __wctomb = __cp_wctomb;
> + __mbtowc = __cp_mbtowc;
> +#else /* !_MB_EXTENDED_CHARSETS_DOS */
> + __wctomb = __ascii_wctomb;
> + __mbtowc = __ascii_mbtowc;
> +#endif /* _MB_EXTENDED_CHARSETS_DOS */
> +#endif
> + break;
> +#ifdef __CYGWIN__
> + case 949:
> + mbc_max = 2;
> +#ifdef _MB_CAPABLE
> + __wctomb = __kr_wctomb;
> + __mbtowc = __kr_mbtowc;
> +#endif
> + break;
> +#endif
> + default:
> + return NULL;
> + }
> + break;
> + case 'A':
> + if (strcmp (charset, "ASCII"))
> return NULL;
> mbc_max = 1;
> +#ifdef _MB_CAPABLE
> + __wctomb = __ascii_wctomb;
> + __mbtowc = __ascii_mbtowc;
> +#endif
> + break;
> +#ifdef __CYGWIN__
> + case 'G':
> + if (strcmp (charset, "GBK"))
> + return NULL;
> + mbc_max = 2;
> +#ifdef _MB_CAPABLE
> + __wctomb = __gbk_wctomb;
> + __mbtowc = __gbk_mbtowc;
> +#endif
> break;
> + case 'B':
> + if (strcmp (charset, "BIG5") && strcmp (charset, "Big5"))
> + return NULL;
> + strcpy (charset, "BIG5");
> + mbc_max = 2;
> +#ifdef _MB_CAPABLE
> + __wctomb = __big5_wctomb;
> + __mbtowc = __big5_mbtowc;
> +#endif
> + break;
> +#endif /* __CYGWIN__ */
> + default:
> + return NULL;
> }
> if (category == LC_CTYPE)
> {
> strcpy (lc_ctype_charset, charset);
> __mb_cur_max = mbc_max;
> +#ifdef __CYGWIN__
> + __set_ctype (charset);
> +#endif
> }
> else if (category == LC_MESSAGES)
> strcpy (lc_message_charset, charset);
> - p->_current_category = category;
> - p->_current_locale = locale;
> return strcpy(current_categories[category], new_categories[category]);
> }
>
> Index: libc/stdlib/Makefile.am
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/stdlib/Makefile.am,v
> retrieving revision 1.28
> diff -u -p -r1.28 Makefile.am
> --- libc/stdlib/Makefile.am 25 Feb 2009 21:33:17 -0000 1.28
> +++ libc/stdlib/Makefile.am 22 Mar 2009 16:25:07 -0000
> @@ -48,6 +48,7 @@ GENERAL_SOURCES = \
> rand_r.c \
> realloc.c \
> reallocf.c \
> + sb_charsets.c \
> strtod.c \
> strtol.c \
> strtoul.c \
> Index: libc/stdlib/local.h
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/stdlib/local.h,v
> retrieving revision 1.1.1.1
> diff -u -p -r1.1.1.1 local.h
> --- libc/stdlib/local.h 17 Feb 2000 19:39:47 -0000 1.1.1.1
> +++ libc/stdlib/local.h 22 Mar 2009 16:25:07 -0000
> @@ -5,4 +5,61 @@
>
> char * _EXFUN(_gcvt,(struct _reent *, double , int , char *, char, int));
>
> +char *__locale_charset ();
> +
> +#ifndef __mbstate_t_defined
> +#include <wchar.h>
> +#endif
> +
> +int (*__wctomb) (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +int __ascii_wctomb (struct _reent *, char *, wchar_t, const char *,
> + mbstate_t *);
> +#ifdef _MB_CAPABLE
> +int __utf8_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +int __sjis_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +int __eucjp_wctomb (struct _reent *, char *, wchar_t, const char *,
> + mbstate_t *);
> +int __jis_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +int __iso_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +int __cp_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +#ifdef __CYGWIN__
> +int __gbk_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +int __kr_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +int __big5_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
> +#endif
> +#endif
> +
> +int (*__mbtowc) (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +int __ascii_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +#ifdef _MB_CAPABLE
> +int __utf8_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +int __sjis_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +int __eucjp_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +int __jis_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +int __iso_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +int __cp_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +#ifdef __CYGWIN__
> +int __gbk_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +int __kr_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +int __big5_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *);
> +#endif
> +#endif
> +
> +wchar_t __iso_8859_conv[14][0x60];
> +int __iso_8859_index (const char *);
> +
> +wchar_t __cp_conv[12][0x80];
> +int __cp_index (const char *);
> +
> #endif
> Index: libc/stdlib/mbtowc_r.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/stdlib/mbtowc_r.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 mbtowc_r.c
> --- libc/stdlib/mbtowc_r.c 19 Mar 2009 19:47:52 -0000 1.11
> +++ libc/stdlib/mbtowc_r.c 22 Mar 2009 16:25:07 -0000
> @@ -5,10 +5,13 @@
> #include <wchar.h>
> #include <string.h>
> #include <errno.h>
> +#include "local.h"
>
> -#ifdef _MB_CAPABLE
> -extern char *__locale_charset ();
> +int (*__mbtowc) (struct _reent *, wchar_t *, const char *, size_t,
> + const char *, mbstate_t *)
> + = __ascii_mbtowc;
>
> +#ifdef _MB_CAPABLE
> typedef enum { ESCAPE, DOLLAR, BRACKET, AT, B, J,
> NUL, JIS_CHAR, OTHER, JIS_C_NUM } JIS_CHAR_TYPE;
> typedef enum { ASCII, JIS, A_ESC, A_ESC_DL, JIS_1, J_ESC, J_ESC_BR,
> @@ -43,17 +46,18 @@ static JIS_ACTION JIS_action_table[JIS_S
> /* J_ESC */ { ERROR, ERROR, NOOP, ERROR, ERROR, ERROR, ERROR, ERROR, ERROR },
> /* J_ESC_BR */{ ERROR, ERROR, ERROR, ERROR, MAKE_A, MAKE_A, ERROR, ERROR, ERROR },
> };
> -#endif /* _MB_CAPABLE */
>
> /* we override the mbstate_t __count field for more complex encodings and use it store a state value */
> #define __state __count
>
> +#ifdef _MB_EXTENDED_CHARSETS_ISO
> int
> -_DEFUN (_mbtowc_r, (r, pwc, s, n, state),
> - struct _reent *r _AND
> - wchar_t *pwc _AND
> - const char *s _AND
> - size_t n _AND
> +_DEFUN (__iso_mbtowc, (r, pwc, s, n, charset, state),
> + struct _reent *r _AND
> + wchar_t *pwc _AND
> + const char *s _AND
> + size_t n _AND
> + const char *charset _AND
> mbstate_t *state)
> {
> wchar_t dummy;
> @@ -62,190 +66,384 @@ _DEFUN (_mbtowc_r, (r, pwc, s, n, state)
> if (pwc == NULL)
> pwc = &dummy;
>
> - if (s != NULL && n == 0)
> + if (s == NULL)
> + return 0;
> +
> + if (n == 0)
> return -2;
>
> -#ifdef _MB_CAPABLE
> - if (strlen (__locale_charset ()) <= 1)
> - { /* fall-through */ }
> - else if (!strcmp (__locale_charset (), "UTF-8"))
> - {
> - int ch;
> - int i = 0;
> -
> - if (s == NULL)
> - return 0; /* UTF-8 character encodings are not state-dependent */
> -
> - if (state->__count == 4)
> - {
> - /* Create the second half of the surrogate pair. For a description
> - see the comment below. */
> - wint_t tmp = (wchar_t)((state->__value.__wchb[0] & 0x07) << 18)
> - | (wchar_t)((state->__value.__wchb[1] & 0x3f) << 12)
> - | (wchar_t)((state->__value.__wchb[2] & 0x3f) << 6)
> - | (wchar_t)(state->__value.__wchb[3] & 0x3f);
> - state->__count = 0;
> - *pwc = 0xdc00 | ((tmp - 0x10000) & 0x3ff);
> - return 2;
> - }
> - if (state->__count == 0)
> - ch = t[i++];
> - else
> + if (*t >= 0xa0)
> + {
> + int iso_idx = __iso_8859_index (charset + 9);
> + if (iso_idx >= 0)
> {
> - if (n < (size_t)-1)
> - ++n;
> - ch = state->__value.__wchb[0];
> + *pwc = __iso_8859_conv[iso_idx][*t - 0xa0];
> + if (*pwc == 0) /* Invalid character */
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + return 1;
> }
> + }
> +
> + *pwc = (wchar_t) *t;
> +
> + if (*t == '\0')
> + return 0;
> +
> + return 1;
> +}
> +#endif /* _MB_EXTENDED_CHARSETS_ISO */
> +
> +#ifdef _MB_EXTENDED_CHARSETS_DOS
> +int
> +_DEFUN (__cp_mbtowc, (r, pwc, s, n, charset, state),
> + struct _reent *r _AND
> + wchar_t *pwc _AND
> + const char *s _AND
> + size_t n _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wchar_t dummy;
> + unsigned char *t = (unsigned char *)s;
> +
> + if (pwc == NULL)
> + pwc = &dummy;
> +
> + if (s == NULL)
> + return 0;
> +
> + if (n == 0)
> + return -2;
>
> - if (ch == '\0')
> + if (*t >= 0x80)
> + {
> + int cp_idx = __cp_index (charset + 2);
> + if (cp_idx >= 0)
> {
> - *pwc = 0;
> - state->__count = 0;
> - return 0; /* s points to the null character */
> + *pwc = __cp_conv[cp_idx][*t - 0x80];
> + if (*pwc == 0) /* Invalid character */
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + return 1;
> }
> + }
> +
> + *pwc = (wchar_t)*t;
> +
> + if (*t == '\0')
> + return 0;
> +
> + return 1;
> +}
> +#endif /* _MB_EXTENDED_CHARSETS_DOS */
> +
> +int
> +_DEFUN (__utf8_mbtowc, (r, pwc, s, n, charset, state),
> + struct _reent *r _AND
> + wchar_t *pwc _AND
> + const char *s _AND
> + size_t n _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wchar_t dummy;
> + unsigned char *t = (unsigned char *)s;
> + int ch;
> + int i = 0;
> +
> + if (pwc == NULL)
> + pwc = &dummy;
> +
> + if (s == NULL)
> + return 0;
> +
> + if (n == 0)
> + return -2;
> +
> + if (state->__count == 4)
> + {
> + /* Create the second half of the surrogate pair. For a description
> + see the comment below. */
> + wint_t tmp = (wchar_t)((state->__value.__wchb[0] & 0x07) << 18)
> + | (wchar_t)((state->__value.__wchb[1] & 0x3f) << 12)
> + | (wchar_t)((state->__value.__wchb[2] & 0x3f) << 6)
> + | (wchar_t)(state->__value.__wchb[3] & 0x3f);
> + state->__count = 0;
> + *pwc = 0xdc00 | ((tmp - 0x10000) & 0x3ff);
> + return 2;
> + }
> + if (state->__count == 0)
> + ch = t[i++];
> + else
> + {
> + if (n < (size_t)-1)
> + ++n;
> + ch = state->__value.__wchb[0];
> + }
> +
> + if (ch == '\0')
> + {
> + *pwc = 0;
> + state->__count = 0;
> + return 0; /* s points to the null character */
> + }
>
> - if (ch >= 0x0 && ch <= 0x7f)
> + if (ch >= 0x0 && ch <= 0x7f)
> + {
> + /* single-byte sequence */
> + state->__count = 0;
> + *pwc = ch;
> + return 1;
> + }
> + if (ch >= 0xc0 && ch <= 0xdf)
> + {
> + /* two-byte sequence */
> + state->__value.__wchb[0] = ch;
> + state->__count = 1;
> + if (n < 2)
> + return -2;
> + ch = t[i++];
> + if (ch < 0x80 || ch > 0xbf)
> {
> - /* single-byte sequence */
> - state->__count = 0;
> - *pwc = ch;
> - return 1;
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + if (state->__value.__wchb[0] < 0xc2)
> + {
> + /* overlong UTF-8 sequence */
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + state->__count = 0;
> + *pwc = (wchar_t)((state->__value.__wchb[0] & 0x1f) << 6)
> + | (wchar_t)(ch & 0x3f);
> + return i;
> + }
> + if (ch >= 0xe0 && ch <= 0xef)
> + {
> + /* three-byte sequence */
> + wchar_t tmp;
> + state->__value.__wchb[0] = ch;
> + if (state->__count == 0)
> + state->__count = 1;
> + else if (n < (size_t)-1)
> + ++n;
> + if (n < 2)
> + return -2;
> + ch = (state->__count == 1) ? t[i++] : state->__value.__wchb[1];
> + if (state->__value.__wchb[0] == 0xe0 && ch < 0xa0)
> + {
> + /* overlong UTF-8 sequence */
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + if (ch < 0x80 || ch > 0xbf)
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + state->__value.__wchb[1] = ch;
> + state->__count = 2;
> + if (n < 3)
> + return -2;
> + ch = t[i++];
> + if (ch < 0x80 || ch > 0xbf)
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + state->__count = 0;
> + tmp = (wchar_t)((state->__value.__wchb[0] & 0x0f) << 12)
> + | (wchar_t)((state->__value.__wchb[1] & 0x3f) << 6)
> + | (wchar_t)(ch & 0x3f);
> +
> + if (tmp >= 0xd800 && tmp <= 0xdfff)
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + *pwc = tmp;
> + return i;
> + }
> + if (ch >= 0xf0 && ch <= 0xf7)
> + {
> + /* four-byte sequence */
> + wint_t tmp;
> + state->__value.__wchb[0] = ch;
> + if (state->__count == 0)
> + state->__count = 1;
> + else if (n < (size_t)-1)
> + ++n;
> + if (n < 2)
> + return -2;
> + ch = (state->__count == 1) ? t[i++] : state->__value.__wchb[1];
> + if (state->__value.__wchb[0] == 0xf0 && ch < 0x90)
> + {
> + /* overlong UTF-8 sequence */
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + if (ch < 0x80 || ch > 0xbf)
> + {
> + r->_errno = EILSEQ;
> + return -1;
> }
> - else if (ch >= 0xc0 && ch <= 0xdf)
> + state->__value.__wchb[1] = ch;
> + if (state->__count == 1)
> + state->__count = 2;
> + else if (n < (size_t)-1)
> + ++n;
> + if (n < 3)
> + return -2;
> + ch = (state->__count == 2) ? t[i++] : state->__value.__wchb[2];
> + if (ch < 0x80 || ch > 0xbf)
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + state->__value.__wchb[2] = ch;
> + state->__count = 3;
> + if (n < 4)
> + return -2;
> + ch = t[i++];
> + if (ch < 0x80 || ch > 0xbf)
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + tmp = (wint_t)((state->__value.__wchb[0] & 0x07) << 18)
> + | (wint_t)((state->__value.__wchb[1] & 0x3f) << 12)
> + | (wint_t)((state->__value.__wchb[2] & 0x3f) << 6)
> + | (wint_t)(ch & 0x3f);
> + if (tmp > 0xffff && sizeof(wchar_t) == 2)
> + {
> + /* On systems which have wchar_t being UTF-16 values, the value
> + doesn't fit into a single wchar_t in this case. So what we
> + do here is to store the state with a special value of __count
> + and return the first half of a surrogate pair. As return
> + value we choose to return the half of the actual UTF-8 char.
> + The second half is returned in case we recognize the special
> + __count value above. */
> + state->__value.__wchb[3] = ch;
> + state->__count = 4;
> + *pwc = 0xd800 | (((tmp - 0x10000) >> 10) & 0x3ff);
> + return 2;
> + }
> + *pwc = tmp;
> + state->__count = 0;
> + return i;
> + }
> +
> + r->_errno = EILSEQ;
> + return -1;
> +}
> +
> +/* Cygwin defines its own doublebyte charset conversion functions
> + because the underlying OS requires wchar_t == UTF-16. */
> +#ifndef __CYGWIN__
> +int
> +_DEFUN (__sjis_mbtowc, (r, pwc, s, n, charset, state),
> + struct _reent *r _AND
> + wchar_t *pwc _AND
> + const char *s _AND
> + size_t n _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wchar_t dummy;
> + unsigned char *t = (unsigned char *)s;
> + int ch;
> + int i = 0;
> +
> + if (pwc == NULL)
> + pwc = &dummy;
> +
> + if (s == NULL)
> + return 0; /* not state-dependent */
> +
> + if (n == 0)
> + return -2;
> +
> + ch = t[i++];
> + if (state->__count == 0)
> + {
> + if (_issjis1 (ch))
> {
> - /* two-byte sequence */
> state->__value.__wchb[0] = ch;
> state->__count = 1;
> - if (n < 2)
> + if (n <= 1)
> return -2;
> ch = t[i++];
> - if (ch < 0x80 || ch > 0xbf)
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - if (state->__value.__wchb[0] < 0xc2)
> - {
> - /* overlong UTF-8 sequence */
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - state->__count = 0;
> - *pwc = (wchar_t)((state->__value.__wchb[0] & 0x1f) << 6)
> - | (wchar_t)(ch & 0x3f);
> - return i;
> }
> - else if (ch >= 0xe0 && ch <= 0xef)
> + }
> + if (state->__count == 1)
> + {
> + if (_issjis2 (ch))
> {
> - /* three-byte sequence */
> - wchar_t tmp;
> - state->__value.__wchb[0] = ch;
> - if (state->__count == 0)
> - state->__count = 1;
> - else if (n < (size_t)-1)
> - ++n;
> - if (n < 2)
> - return -2;
> - ch = (state->__count == 1) ? t[i++] : state->__value.__wchb[1];
> - if (state->__value.__wchb[0] == 0xe0 && ch < 0xa0)
> - {
> - /* overlong UTF-8 sequence */
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - if (ch < 0x80 || ch > 0xbf)
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - state->__value.__wchb[1] = ch;
> - state->__count = 2;
> - if (n < 3)
> - return -2;
> - ch = t[i++];
> - if (ch < 0x80 || ch > 0xbf)
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> + *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)ch;
> state->__count = 0;
> - tmp = (wchar_t)((state->__value.__wchb[0] & 0x0f) << 12)
> - | (wchar_t)((state->__value.__wchb[1] & 0x3f) << 6)
> - | (wchar_t)(ch & 0x3f);
> -
> - if (tmp >= 0xd800 && tmp <= 0xdfff)
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - *pwc = tmp;
> return i;
> }
> - else if (ch >= 0xf0 && ch <= 0xf7)
> + else
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + }
> +
> + *pwc = (wchar_t)*t;
> +
> + if (*t == '\0')
> + return 0;
> +
> + return 1;
> +}
> +
> +int
> +_DEFUN (__eucjp_mbtowc, (r, pwc, s, n, charset, state),
> + struct _reent *r _AND
> + wchar_t *pwc _AND
> + const char *s _AND
> + size_t n _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wchar_t dummy;
> + unsigned char *t = (unsigned char *)s;
> + int ch;
> + int i = 0;
> +
> + if (pwc == NULL)
> + pwc = &dummy;
> +
> + if (s == NULL)
> + return 0;
> +
> + if (n == 0)
> + return -2;
> +
> + ch = t[i++];
> + if (state->__count == 0)
> + {
> + if (_iseucjp (ch))
> {
> - /* four-byte sequence */
> - wint_t tmp;
> state->__value.__wchb[0] = ch;
> - if (state->__count == 0)
> - state->__count = 1;
> - else if (n < (size_t)-1)
> - ++n;
> - if (n < 2)
> - return -2;
> - ch = (state->__count == 1) ? t[i++] : state->__value.__wchb[1];
> - if (state->__value.__wchb[0] == 0xf0 && ch < 0x90)
> - {
> - /* overlong UTF-8 sequence */
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - if (ch < 0x80 || ch > 0xbf)
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - state->__value.__wchb[1] = ch;
> - if (state->__count == 1)
> - state->__count = 2;
> - else if (n < (size_t)-1)
> - ++n;
> - if (n < 3)
> - return -2;
> - ch = (state->__count == 2) ? t[i++] : state->__value.__wchb[2];
> - if (ch < 0x80 || ch > 0xbf)
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - state->__value.__wchb[2] = ch;
> - state->__count = 3;
> - if (n < 4)
> + state->__count = 1;
> + if (n <= 1)
> return -2;
> ch = t[i++];
> - if (ch < 0x80 || ch > 0xbf)
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - tmp = (wint_t)((state->__value.__wchb[0] & 0x07) << 18)
> - | (wint_t)((state->__value.__wchb[1] & 0x3f) << 12)
> - | (wint_t)((state->__value.__wchb[2] & 0x3f) << 6)
> - | (wint_t)(ch & 0x3f);
> - if (tmp > 0xffff && sizeof(wchar_t) == 2)
> - {
> - /* On systems which have wchar_t being UTF-16 values, the value
> - doesn't fit into a single wchar_t in this case. So what we
> - do here is to store the state with a special value of __count
> - and return the first half of a surrogate pair. As return
> - value we choose to return the half of the actual UTF-8 char.
> - The second half is returned in case we recognize the special
> - __count value above. */
> - state->__value.__wchb[3] = ch;
> - state->__count = 4;
> - *pwc = 0xd800 | (((tmp - 0x10000) >> 10) & 0x3ff);
> - return 2;
> - }
> - *pwc = tmp;
> + }
> + }
> + if (state->__count == 1)
> + {
> + if (_iseucjp (ch))
> + {
> + *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)ch;
> state->__count = 0;
> return i;
> }
> @@ -254,165 +452,141 @@ _DEFUN (_mbtowc_r, (r, pwc, s, n, state)
> r->_errno = EILSEQ;
> return -1;
> }
> - }
> - else if (!strcmp (__locale_charset (), "SJIS"))
> + }
> +
> + *pwc = (wchar_t)*t;
> +
> + if (*t == '\0')
> + return 0;
> +
> + return 1;
> +}
> +
> +int
> +_DEFUN (__jis_mbtowc, (r, pwc, s, n, charset, state),
> + struct _reent *r _AND
> + wchar_t *pwc _AND
> + const char *s _AND
> + size_t n _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wchar_t dummy;
> + unsigned char *t = (unsigned char *)s;
> + JIS_STATE curr_state;
> + JIS_ACTION action;
> + JIS_CHAR_TYPE ch;
> + unsigned char *ptr;
> + unsigned int i;
> + int curr_ch;
> +
> + if (pwc == NULL)
> + pwc = &dummy;
> +
> + if (s == NULL)
> {
> - int ch;
> - int i = 0;
> - if (s == NULL)
> - return 0; /* not state-dependent */
> - ch = t[i++];
> - if (state->__count == 0)
> - {
> - if (_issjis1 (ch))
> - {
> - state->__value.__wchb[0] = ch;
> - state->__count = 1;
> - if (n <= 1)
> - return -2;
> - ch = t[i++];
> - }
> - }
> - if (state->__count == 1)
> - {
> - if (_issjis2 (ch))
> - {
> - *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)ch;
> - state->__count = 0;
> - return i;
> - }
> - else
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - }
> + state->__state = ASCII;
> + return 1; /* state-dependent */
> }
> - else if (!strcmp (__locale_charset (), "EUCJP"))
> +
> + if (n == 0)
> + return -2;
> +
> + curr_state = state->__state;
> + ptr = t;
> +
> + for (i = 0; i < n; ++i)
> {
> - int ch;
> - int i = 0;
> - if (s == NULL)
> - return 0; /* not state-dependent */
> - ch = t[i++];
> - if (state->__count == 0)
> + curr_ch = t[i];
> + switch (curr_ch)
> {
> - if (_iseucjp (ch))
> - {
> - state->__value.__wchb[0] = ch;
> - state->__count = 1;
> - if (n <= 1)
> - return -2;
> - ch = t[i++];
> - }
> - }
> - if (state->__count == 1)
> - {
> - if (_iseucjp (ch))
> - {
> - *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)ch;
> - state->__count = 0;
> - return i;
> - }
> + case ESC_CHAR:
> + ch = ESCAPE;
> + break;
> + case '$':
> + ch = DOLLAR;
> + break;
> + case '@':
> + ch = AT;
> + break;
> + case '(':
> + ch = BRACKET;
> + break;
> + case 'B':
> + ch = B;
> + break;
> + case 'J':
> + ch = J;
> + break;
> + case '\0':
> + ch = NUL;
> + break;
> + default:
> + if (_isjis (curr_ch))
> + ch = JIS_CHAR;
> else
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> + ch = OTHER;
> + }
> +
> + action = JIS_action_table[curr_state][ch];
> + curr_state = JIS_state_table[curr_state][ch];
> +
> + switch (action)
> + {
> + case NOOP:
> + break;
> + case EMPTY:
> + state->__state = ASCII;
> + *pwc = (wchar_t)0;
> + return 0;
> + case COPY_A:
> + state->__state = ASCII;
> + *pwc = (wchar_t)*ptr;
> + return (i + 1);
> + case COPY_J1:
> + state->__value.__wchb[0] = t[i];
> + break;
> + case COPY_J2:
> + state->__state = JIS;
> + *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)(t[i]);
> + return (i + 1);
> + case MAKE_A:
> + ptr = (unsigned char *)(t + i + 1);
> + break;
> + case ERROR:
> + default:
> + r->_errno = EILSEQ;
> + return -1;
> }
> +
> }
> - else if (!strcmp (__locale_charset (), "JIS"))
> - {
> - JIS_STATE curr_state;
> - JIS_ACTION action;
> - JIS_CHAR_TYPE ch;
> - unsigned char *ptr;
> - unsigned int i;
> - int curr_ch;
> -
> - if (s == NULL)
> - {
> - state->__state = ASCII;
> - return 1; /* state-dependent */
> - }
> -
> - curr_state = state->__state;
> - ptr = t;
> -
> - for (i = 0; i < n; ++i)
> - {
> - curr_ch = t[i];
> - switch (curr_ch)
> - {
> - case ESC_CHAR:
> - ch = ESCAPE;
> - break;
> - case '$':
> - ch = DOLLAR;
> - break;
> - case '@':
> - ch = AT;
> - break;
> - case '(':
> - ch = BRACKET;
> - break;
> - case 'B':
> - ch = B;
> - break;
> - case 'J':
> - ch = J;
> - break;
> - case '\0':
> - ch = NUL;
> - break;
> - default:
> - if (_isjis (curr_ch))
> - ch = JIS_CHAR;
> - else
> - ch = OTHER;
> - }
>
> - action = JIS_action_table[curr_state][ch];
> - curr_state = JIS_state_table[curr_state][ch];
> -
> - switch (action)
> - {
> - case NOOP:
> - break;
> - case EMPTY:
> - state->__state = ASCII;
> - *pwc = (wchar_t)0;
> - return 0;
> - case COPY_A:
> - state->__state = ASCII;
> - *pwc = (wchar_t)*ptr;
> - return (i + 1);
> - case COPY_J1:
> - state->__value.__wchb[0] = t[i];
> - break;
> - case COPY_J2:
> - state->__state = JIS;
> - *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)(t[i]);
> - return (i + 1);
> - case MAKE_A:
> - ptr = (unsigned char *)(t + i + 1);
> - break;
> - case ERROR:
> - default:
> - r->_errno = EILSEQ;
> - return -1;
> - }
> + state->__state = curr_state;
> + return -2; /* n < bytes needed */
> +}
> +#endif /* !__CYGWIN__*/
> +#endif /* _MB_CAPABLE */
>
> - }
> +int
> +_DEFUN (__ascii_mbtowc, (r, pwc, s, n, charset, state),
> + struct _reent *r _AND
> + wchar_t *pwc _AND
> + const char *s _AND
> + size_t n _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wchar_t dummy;
> + unsigned char *t = (unsigned char *)s;
>
> - state->__state = curr_state;
> - return -2; /* n < bytes needed */
> - }
> -#endif /* _MB_CAPABLE */
> + if (pwc == NULL)
> + pwc = &dummy;
>
> - /* otherwise this must be the "C" locale or unknown locale */
> if (s == NULL)
> - return 0; /* not state-dependent */
> + return 0;
> +
> + if (n == 0)
> + return -2;
>
> *pwc = (wchar_t)*t;
>
> @@ -421,3 +595,14 @@ _DEFUN (_mbtowc_r, (r, pwc, s, n, state)
>
> return 1;
> }
> +
> +int
> +_DEFUN (_mbtowc_r, (r, pwc, s, n, state),
> + struct _reent *r _AND
> + wchar_t *pwc _AND
> + const char *s _AND
> + size_t n _AND
> + mbstate_t *state)
> +{
> + return __mbtowc (r, pwc, s, n, __locale_charset (), state);
> +}
> Index: libc/stdlib/sb_charsets.c
> ===================================================================
> RCS file: libc/stdlib/sb_charsets.c
> diff -N libc/stdlib/sb_charsets.c
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ libc/stdlib/sb_charsets.c 22 Mar 2009 16:25:07 -0000
> @@ -0,0 +1,697 @@
> +#include <newlib.h>
> +#include <wchar.h>
> +
> +#ifdef _MB_CAPABLE
> +extern char *__locale_charset ();
> +
> +#ifdef _MB_EXTENDED_CHARSETS_ISO
> +/* Tables for the ISO-8859-x to UTF conversion. The first index into the
> + table is a value computed from the value x (function __iso_8859_index),
> + the second index is the value of the incoming character - 0xa0.
> + Values < 0xa0 don't have to be converted anyway. */
> +wchar_t __iso_8859_conv[14][0x60] = {
> + /* ISO-8859-2 */
> + { 0xa0, 0x104, 0x2d8, 0x141, 0xa4, 0x13d, 0x15a, 0xa7,
> + 0xa8, 0x160, 0x15e, 0x164, 0x179, 0xad, 0x17d, 0x17b,
> + 0xb0, 0x105, 0x2db, 0x142, 0xb4, 0x13e, 0x15b, 0x2c7,
> + 0xb8, 0x161, 0x15f, 0x165, 0x17a, 0x2dd, 0x17e, 0x17c,
> + 0x154, 0xc1, 0xc2, 0x102, 0xc4, 0x139, 0x106, 0xc7,
> + 0x10c, 0xc9, 0x118, 0xcb, 0x11a, 0xcd, 0xce, 0x10e,
> + 0x110, 0x143, 0x147, 0xd3, 0xd4, 0x150, 0xd6, 0xd7,
> + 0x158, 0x16e, 0xda, 0x170, 0xdc, 0xdd, 0x162, 0xdf,
> + 0x155, 0xe1, 0xe2, 0x103, 0xe4, 0x13a, 0x107, 0xe7,
> + 0x10d, 0xe9, 0x119, 0xeb, 0x11b, 0xed, 0xee, 0x10f,
> + 0x111, 0x144, 0x148, 0xf3, 0xf4, 0x151, 0xf6, 0xf7,
> + 0x159, 0x16f, 0xfa, 0x171, 0xfc, 0xfd, 0x163, 0x2d9 },
> + /* ISO-8859-3 */
> + { 0xa0, 0x126, 0x2d8, 0xa3, 0xa4, 0x0, 0x124, 0xa7,
> + 0xa8, 0x130, 0x15e, 0x11e, 0x134, 0xad, 0x0, 0x17b,
> + 0xb0, 0x127, 0xb2, 0xb3, 0xb4, 0xb5, 0x125, 0xb7,
> + 0xb8, 0x131, 0x15f, 0x11f, 0x135, 0xbd, 0x0, 0x17c,
> + 0xc0, 0xc1, 0xc2, 0x0, 0xc4, 0x10a, 0x108, 0xc7,
> + 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
> + 0x0, 0xd1, 0xd2, 0xd3, 0xd4, 0x120, 0xd6, 0xd7,
> + 0x11c, 0xd9, 0xda, 0xdb, 0xdc, 0x16c, 0x15c, 0xdf,
> + 0xe0, 0xe1, 0xe2, 0x0, 0xe4, 0x10b, 0x109, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
> + 0x0, 0xf1, 0xf2, 0xf3, 0xf4, 0x121, 0xf6, 0xf7,
> + 0x11d, 0xf9, 0xfa, 0xfb, 0xfc, 0x16d, 0x15d, 0x2d9 },
> + /* ISO-8859-4 */
> + { 0xa0, 0x104, 0x138, 0x156, 0xa4, 0x128, 0x13b, 0xa7,
> + 0xa8, 0x160, 0x112, 0x122, 0x166, 0xad, 0x17d, 0xaf,
> + 0xb0, 0x105, 0x2db, 0x157, 0xb4, 0x129, 0x13c, 0x2c7,
> + 0xb8, 0x161, 0x113, 0x123, 0x167, 0x14a, 0x17e, 0x14b,
> + 0x100, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0x12e,
> + 0x10c, 0xc9, 0x118, 0xcb, 0x116, 0xcd, 0xce, 0x12a,
> + 0x110, 0x145, 0x14c, 0x136, 0xd4, 0xd5, 0xd6, 0xd7,
> + 0xd8, 0x172, 0xda, 0xdb, 0xdc, 0x168, 0x16a, 0xdf,
> + 0x101, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0x12f,
> + 0x10d, 0xe9, 0x119, 0xeb, 0x117, 0xed, 0xee, 0x12b,
> + 0x111, 0x146, 0x14d, 0x137, 0xf4, 0xf5, 0xf6, 0xf7,
> + 0xf8, 0x173, 0xfa, 0xfb, 0xfc, 0x169, 0x16b, 0x2d9 },
> + /* ISO-8859-5 */
> + { 0xa0, 0x401, 0x402, 0x403, 0x404, 0x405, 0x406, 0x407,
> + 0x408, 0x409, 0x40a, 0x40b, 0x40c, 0xad, 0x40e, 0x40f,
> + 0x410, 0x411, 0x412, 0x413, 0x414, 0x415, 0x416, 0x417,
> + 0x418, 0x419, 0x41a, 0x41b, 0x41c, 0x41d, 0x41e, 0x41f,
> + 0x420, 0x421, 0x422, 0x423, 0x424, 0x425, 0x426, 0x427,
> + 0x428, 0x429, 0x42a, 0x42b, 0x42c, 0x42d, 0x42e, 0x42f,
> + 0x430, 0x431, 0x432, 0x433, 0x434, 0x435, 0x436, 0x437,
> + 0x438, 0x439, 0x43a, 0x43b, 0x43c, 0x43d, 0x43e, 0x43f,
> + 0x440, 0x441, 0x442, 0x443, 0x444, 0x445, 0x446, 0x447,
> + 0x448, 0x449, 0x44a, 0x44b, 0x44c, 0x44d, 0x44e, 0x44f,
> + 0x2116, 0x451, 0x452, 0x453, 0x454, 0x455, 0x456, 0x457,
> + 0x458, 0x459, 0x45a, 0x45b, 0x45c, 0xa7, 0x45e, 0x45f },
> + /* ISO-8859-6 */
> + { 0xa0, 0x0, 0x0, 0x0, 0xa4, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0, 0x60c, 0xad, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x61b, 0x0, 0x0, 0x0, 0x61f,
> + 0x0, 0x621, 0x622, 0x623, 0x624, 0x625, 0x626, 0x627,
> + 0x628, 0x629, 0x62a, 0x62b, 0x62c, 0x62d, 0x62e, 0x62f,
> + 0x630, 0x631, 0x632, 0x633, 0x634, 0x635, 0x636, 0x637,
> + 0x638, 0x639, 0x63a, 0x0, 0x0, 0x0, 0x0, 0x0,
> + 0x640, 0x641, 0x642, 0x643, 0x644, 0x645, 0x646, 0x647,
> + 0x648, 0x649, 0x64a, 0x64b, 0x64c, 0x64d, 0x64e, 0x64f,
> + 0x650, 0x651, 0x652, 0x64b, 0xf4, 0xf5, 0xf6, 0xf7,
> + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff },
> + /* ISO-8859-7 */
> + { 0xa0, 0x2018, 0x2019, 0xa3, 0x20ac, 0x20af, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0x37a, 0xab, 0xac, 0xad, 0x0, 0x2015,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0x384, 0x385, 0x386, 0xb7,
> + 0x388, 0x389, 0x38a, 0xbb, 0x38c, 0xbd, 0x38e, 0x38f,
> + 0x390, 0x391, 0x392, 0x393, 0x394, 0x395, 0x396, 0x397,
> + 0x398, 0x399, 0x39a, 0x39b, 0x39c, 0x39d, 0x39e, 0x39f,
> + 0x3a0, 0x3a1, 0x0, 0x3a3, 0x3a4, 0x3a5, 0x3a6, 0x3a7,
> + 0x3a8, 0x3a9, 0x3aa, 0x3ab, 0x3ac, 0x3ad, 0x3ae, 0x3af,
> + 0x3b0, 0x3b1, 0x3b2, 0x3b3, 0x3b4, 0x3b5, 0x3b6, 0x3b7,
> + 0x3b8, 0x3b9, 0x3ba, 0x3bb, 0x3bc, 0x3bd, 0x3be, 0x3bf,
> + 0x3c0, 0x3c1, 0x3c2, 0x3c3, 0x3c4, 0x3c5, 0x3c6, 0x3c7,
> + 0x3c8, 0x3c9, 0x3ca, 0x3cb, 0x3cc, 0x3cd, 0x3ce, 0xff },
> + /* ISO-8859-8 */
> + { 0xa0, 0x0, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0xd7, 0xab, 0xac, 0xad, 0xae, 0xaf,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xb8, 0xb9, 0xf7, 0xbb, 0xbc, 0xbd, 0xbe, 0x0,
> + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2017,
> + 0x5d0, 0x5d1, 0x5d2, 0x5d3, 0x5d4, 0x5d5, 0x5d6, 0x5d7,
> + 0x5d8, 0x5d9, 0x5da, 0x5db, 0x5dc, 0x5dd, 0x5de, 0x5df,
> + 0x5e0, 0x5e1, 0x5e2, 0x5e3, 0x5e4, 0x5e5, 0x5e6, 0x5e7,
> + 0x5e8, 0x5e9, 0x5ea, 0x0, 0x0, 0x200e, 0x200f, 0x200e },
> + /* ISO-8859-9 */
> + { 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
> + 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
> + 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
> + 0x11e, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
> + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x130, 0x15e, 0xdf,
> + 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
> + 0x11f, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
> + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x131, 0x15f, 0xff },
> + /* ISO-8859-10 */
> + { 0xa0, 0x104, 0x112, 0x122, 0x12a, 0x128, 0x136, 0xa7,
> + 0x13b, 0x110, 0x160, 0x166, 0x17d, 0xad, 0x16a, 0x14a,
> + 0xb0, 0x105, 0x113, 0x123, 0x12b, 0x129, 0x137, 0xb7,
> + 0x13c, 0x111, 0x161, 0x167, 0x17e, 0x2015, 0x16b, 0x14b,
> + 0x100, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0x12e,
> + 0x10c, 0xc9, 0x118, 0xcb, 0x116, 0xcd, 0xce, 0xcf,
> + 0xd0, 0x145, 0x14c, 0xd3, 0xd4, 0xd5, 0xd6, 0x168,
> + 0xd8, 0x172, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
> + 0x101, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0x12f,
> + 0x10d, 0xe9, 0x119, 0xeb, 0x117, 0xed, 0xee, 0xef,
> + 0xf0, 0x146, 0x14d, 0xf3, 0xf4, 0xf5, 0xf6, 0x169,
> + 0xf8, 0x173, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0x138 },
> + /* ISO-8859-11 */
> + { 0xa0, 0xe01, 0xe02, 0xe03, 0xe04, 0xe05, 0xe06, 0xe07,
> + 0xe08, 0xe09, 0xe0a, 0xe0b, 0xe0c, 0xe0d, 0xe0e, 0xe0f,
> + 0xe10, 0xe11, 0xe12, 0xe13, 0xe14, 0xe15, 0xe16, 0xe17,
> + 0xe18, 0xe19, 0xe1a, 0xe1b, 0xe1c, 0xe1d, 0xe1e, 0xe1f,
> + 0xe20, 0xe21, 0xe22, 0xe23, 0xe24, 0xe25, 0xe26, 0xe27,
> + 0xe28, 0xe29, 0xe2a, 0xe2b, 0xe2c, 0xe2d, 0xe2e, 0xe2f,
> + 0xe30, 0xe31, 0xe32, 0xe33, 0xe34, 0xe35, 0xe36, 0xe37,
> + 0xe38, 0xe39, 0xe3a, 0x0, 0x0, 0x0, 0x0, 0xe3f,
> + 0xe40, 0xe41, 0xe42, 0xe43, 0xe44, 0xe45, 0xe46, 0xe47,
> + 0xe48, 0xe49, 0xe4a, 0xe4b, 0xe4c, 0xe4d, 0xe4e, 0xe4f,
> + 0xe50, 0xe51, 0xe52, 0xe53, 0xe54, 0xe55, 0xe56, 0xe57,
> + 0xe58, 0xe59, 0xe5a, 0xe5b, 0xe31, 0xe34, 0xe47, 0xff },
> + /* ISO-8859-12 doesn't exist. The below code decrements the index
> + into the table by one for ISO numbers > 12. */
> + /* ISO-8859-13 */
> + { 0xa0, 0x201d, 0xa2, 0xa3, 0xa4, 0x201e, 0xa6, 0xa7,
> + 0xd8, 0xa9, 0x156, 0xab, 0xac, 0xad, 0xae, 0xc6,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0x201c, 0xb5, 0xb6, 0xb7,
> + 0xf8, 0xb9, 0x157, 0xbb, 0xbc, 0xbd, 0xbe, 0xe6,
> + 0x104, 0x12e, 0x100, 0x106, 0xc4, 0xc5, 0x118, 0x112,
> + 0x10c, 0xc9, 0x179, 0x116, 0x122, 0x136, 0x12a, 0x13b,
> + 0x160, 0x143, 0x145, 0xd3, 0x14c, 0xd5, 0xd6, 0xd7,
> + 0x172, 0x141, 0x15a, 0x16a, 0xdc, 0x17b, 0x17d, 0xdf,
> + 0x105, 0x12f, 0x101, 0x107, 0xe4, 0xe5, 0x119, 0x113,
> + 0x10d, 0xe9, 0x17a, 0x117, 0x123, 0x137, 0x12b, 0x13c,
> + 0x161, 0x144, 0x146, 0xf3, 0x14d, 0xf5, 0xf6, 0xf7,
> + 0x173, 0x142, 0x15b, 0x16b, 0xfc, 0x17c, 0x17e, 0x2019 },
> + /* ISO-8859-14 */
> + { 0xa0, 0x1e02, 0x1e03, 0xa3, 0x10a, 0x10b, 0x1e0a, 0xa7,
> + 0x1e80, 0xa9, 0x1e82, 0x1e0b, 0x1ef2, 0xad, 0xae, 0x178,
> + 0x1e1e, 0x1e1f, 0x120, 0x121, 0x1e40, 0x1e41, 0xb6, 0x1e56,
> + 0x1e81, 0x1e57, 0x1e83, 0x1e60, 0x1ef3, 0x1e84, 0x1e85, 0x1e61,
> + 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
> + 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
> + 0x174, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0x1e6a,
> + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0x176, 0xdf,
> + 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
> + 0x175, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0x1e6b,
> + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0x177, 0xff },
> + /* ISO-8859-15 */
> + { 0xa0, 0xa1, 0xa2, 0xa3, 0x20ac, 0xa5, 0x160, 0xa7,
> + 0x161, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0x17d, 0xb5, 0xb6, 0xb7,
> + 0x17e, 0xb9, 0xba, 0xbb, 0x152, 0x153, 0x178, 0xbf,
> + 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
> + 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
> + 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
> + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
> + 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
> + 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
> + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff },
> + /* ISO-8859-16 */
> + { 0xa0, 0x104, 0x105, 0x141, 0x20ac, 0x201e, 0x160, 0xa7,
> + 0x161, 0xa9, 0x218, 0xab, 0x179, 0xad, 0x17a, 0x17b,
> + 0xb0, 0xb1, 0x10c, 0x142, 0x17d, 0x201d, 0xb6, 0xb7,
> + 0x17e, 0x10d, 0x219, 0xbb, 0x152, 0x153, 0x178, 0x17c,
> + 0xc0, 0xc1, 0xc2, 0x102, 0xc4, 0x106, 0xc6, 0xc7,
> + 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
> + 0x110, 0x143, 0xd2, 0xd3, 0xd4, 0x150, 0xd6, 0x15a,
> + 0x170, 0xd9, 0xda, 0xdb, 0xdc, 0x118, 0x21a, 0xdf,
> + 0xe0, 0xe1, 0xe2, 0x103, 0xe4, 0x107, 0xe6, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
> + 0x111, 0x144, 0xf2, 0xf3, 0xf4, 0x151, 0xf6, 0x15b,
> + 0x171, 0xf9, 0xfa, 0xfb, 0xfc, 0x119, 0x21b, 0xff }
> +};
> +#endif /* _MB_EXTENDED_CHARSETS_ISO */
> +
> +#ifdef _MB_EXTENDED_CHARSETS_DOS
> +/* Tables for the Windows default singlebyte ANSI codepage conversion.
> + The first index into the table is a value computed from the codepage
> + value (function __cp_index), the second index is the value of the
> + incoming character - 0x80.
> + Values < 0x80 don't have to be converted anyway. */
> +wchar_t __cp_conv[22][0x80] = {
> + /* CP437 */
> + { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0xe0, 0xe5, 0xe7,
> + 0xea, 0xeb, 0xe8, 0xef, 0xee, 0xec, 0xc4, 0xc5,
> + 0xc9, 0xe6, 0xc6, 0xf4, 0xf6, 0xf2, 0xfb, 0xf9,
> + 0xff, 0xd6, 0xdc, 0xa2, 0xa3, 0xa5, 0x20a7, 0x192,
> + 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0xaa, 0xba,
> + 0xbf, 0x2310, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
> + 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
> + 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
> + 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
> + 0x3b1, 0xdf, 0x393, 0x3c0, 0x3a3, 0x3c3, 0xb5, 0x3c4,
> + 0x3a6, 0x398, 0x3a9, 0x3b4, 0x221e, 0x3c6, 0x3b5, 0x2229,
> + 0x2261, 0xb1, 0x2265, 0x2264, 0x2320, 0x2321, 0xf7, 0x2248,
> + 0xb0, 0x2219, 0xb7, 0x221a, 0x207f, 0xb2, 0x25a0, 0xa0 },
> + /* CP720 */
> + { 0x0, 0x0, 0xe9, 0xe2, 0x0, 0xe0, 0x0, 0xe7,
> + 0xea, 0xeb, 0xe8, 0xef, 0xee, 0x0, 0x0, 0x0,
> + 0x0, 0x651, 0x652, 0xf4, 0xa4, 0x640, 0xfb, 0xf9,
> + 0x621, 0x622, 0x623, 0x624, 0xa3, 0x625, 0x626, 0x627,
> + 0x628, 0x629, 0x62a, 0x62b, 0x62c, 0x62d, 0x62e, 0x62f,
> + 0x630, 0x631, 0x632, 0x633, 0x634, 0x635, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
> + 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
> + 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
> + 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
> + 0x636, 0x637, 0x638, 0x639, 0x63a, 0x641, 0xb5, 0x642,
> + 0x643, 0x644, 0x645, 0x646, 0x647, 0x648, 0x649, 0x64a,
> + 0x2261, 0x64b, 0x64c, 0x64d, 0x64e, 0x64f, 0x650, 0x2248,
> + 0xb0, 0x2219, 0xb7, 0x221a, 0x207f, 0xb2, 0x25a0, 0xa0 },
> + /* CP737 */
> + { 0x391, 0x392, 0x393, 0x394, 0x395, 0x396, 0x397, 0x398,
> + 0x399, 0x39a, 0x39b, 0x39c, 0x39d, 0x39e, 0x39f, 0x3a0,
> + 0x3a1, 0x3a3, 0x3a4, 0x3a5, 0x3a6, 0x3a7, 0x3a8, 0x3a9,
> + 0x3b1, 0x3b2, 0x3b3, 0x3b4, 0x3b5, 0x3b6, 0x3b7, 0x3b8,
> + 0x3b9, 0x3ba, 0x3bb, 0x3bc, 0x3bd, 0x3be, 0x3bf, 0x3c0,
> + 0x3c1, 0x3c3, 0x3c2, 0x3c4, 0x3c5, 0x3c6, 0x3c7, 0x3c8,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
> + 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
> + 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
> + 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
> + 0x3c9, 0x3ac, 0x3ad, 0x3ae, 0x3ca, 0x3af, 0x3cc, 0x3cd,
> + 0x3cb, 0x3ce, 0x386, 0x388, 0x389, 0x38a, 0x38c, 0x38e,
> + 0x38f, 0xb1, 0x2265, 0x2264, 0x3aa, 0x3ab, 0xf7, 0x2248,
> + 0xb0, 0x2219, 0xb7, 0x221a, 0x207f, 0xb2, 0x25a0, 0xa0 },
> + /* CP775 */
> + { 0x106, 0xfc, 0xe9, 0x101, 0xe4, 0x123, 0xe5, 0x107,
> + 0x142, 0x113, 0x156, 0x157, 0x12b, 0x179, 0xc4, 0xc5,
> + 0xc9, 0xe6, 0xc6, 0x14d, 0xf6, 0x122, 0xa2, 0x15a,
> + 0x15b, 0xd6, 0xdc, 0xf8, 0xa3, 0xd8, 0xd7, 0xa4,
> + 0x100, 0x12a, 0xf3, 0x17b, 0x17c, 0x17a, 0x201d, 0xa6,
> + 0xa9, 0xae, 0xac, 0xbd, 0xbc, 0x141, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x104, 0x10c, 0x118,
> + 0x116, 0x2563, 0x2551, 0x2557, 0x255d, 0x12e, 0x160, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x172, 0x16a,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x17d,
> + 0x105, 0x10d, 0x119, 0x117, 0x12f, 0x161, 0x173, 0x16b,
> + 0x17e, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
> + 0xd3, 0xdf, 0x14c, 0x143, 0xf5, 0xd5, 0xb5, 0x144,
> + 0x136, 0x137, 0x13b, 0x13c, 0x146, 0x112, 0x145, 0x2019,
> + 0xad, 0xb1, 0x201c, 0xbe, 0xb6, 0xa7, 0xf7, 0x201e,
> + 0xb0, 0x2219, 0xb7, 0xb9, 0xb3, 0xb2, 0x25a0, 0xa0 },
> + /* CP850 */
> + { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0xe0, 0xe5, 0xe7,
> + 0xea, 0xeb, 0xe8, 0xef, 0xee, 0xec, 0xc4, 0xc5,
> + 0xc9, 0xe6, 0xc6, 0xf4, 0xf6, 0xf2, 0xfb, 0xf9,
> + 0xff, 0xd6, 0xdc, 0xf8, 0xa3, 0xd8, 0xd7, 0x192,
> + 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0xaa, 0xba,
> + 0xbf, 0xae, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0xc1, 0xc2, 0xc0,
> + 0xa9, 0x2563, 0x2551, 0x2557, 0x255d, 0xa2, 0xa5, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0xe3, 0xc3,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
> + 0xf0, 0xd0, 0xca, 0xcb, 0xc8, 0x131, 0xcd, 0xce,
> + 0xcf, 0x2518, 0x250c, 0x2588, 0x2584, 0xa6, 0xcc, 0x2580,
> + 0xd3, 0xdf, 0xd4, 0xd2, 0xf5, 0xd5, 0xb5, 0xfe,
> + 0xde, 0xda, 0xdb, 0xd9, 0xfd, 0xdd, 0xaf, 0xb4,
> + 0xad, 0xb1, 0x2017, 0xbe, 0xb6, 0xa7, 0xf7, 0xb8,
> + 0xb0, 0xa8, 0xb7, 0xb9, 0xb3, 0xb2, 0x25a0, 0xa0 },
> + /* CP852 */
> + { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0x16f, 0x107, 0xe7,
> + 0x142, 0xeb, 0x150, 0x151, 0xee, 0x179, 0xc4, 0x106,
> + 0xc9, 0x139, 0x13a, 0xf4, 0xf6, 0x13d, 0x13e, 0x15a,
> + 0x15b, 0xd6, 0xdc, 0x164, 0x165, 0x141, 0xd7, 0x10d,
> + 0xe1, 0xed, 0xf3, 0xfa, 0x104, 0x105, 0x17d, 0x17e,
> + 0x118, 0x119, 0xac, 0x17a, 0x10c, 0x15f, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0xc1, 0xc2, 0x11a,
> + 0x15e, 0x2563, 0x2551, 0x2557, 0x255d, 0x17b, 0x17c, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x102, 0x103,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
> + 0x111, 0x110, 0x10e, 0xcb, 0x10f, 0x147, 0xcd, 0xce,
> + 0x11b, 0x2518, 0x250c, 0x2588, 0x2584, 0x162, 0x16e, 0x2580,
> + 0xd3, 0xdf, 0xd4, 0x143, 0x144, 0x148, 0x160, 0x161,
> + 0x154, 0xda, 0x155, 0x170, 0xfd, 0xdd, 0x163, 0xb4,
> + 0xad, 0x2dd, 0x2db, 0x2c7, 0x2d8, 0xa7, 0xf7, 0xb8,
> + 0xb0, 0xa8, 0x2d9, 0x171, 0x158, 0x159, 0x25a0, 0xa0 },
> + /* CP855 */
> + { 0x452, 0x402, 0x453, 0x403, 0x451, 0x401, 0x454, 0x404,
> + 0x455, 0x405, 0x456, 0x406, 0x457, 0x407, 0x458, 0x408,
> + 0x459, 0x409, 0x45a, 0x40a, 0x45b, 0x40b, 0x45c, 0x40c,
> + 0x45e, 0x40e, 0x45f, 0x40f, 0x44e, 0x42e, 0x44a, 0x42a,
> + 0x430, 0x410, 0x431, 0x411, 0x446, 0x426, 0x434, 0x414,
> + 0x435, 0x415, 0x444, 0x424, 0x433, 0x413, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x445, 0x425, 0x438,
> + 0x418, 0x2563, 0x2551, 0x2557, 0x255d, 0x439, 0x419, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x43a, 0x41a,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
> + 0x43b, 0x41b, 0x43c, 0x41c, 0x43d, 0x41d, 0x43e, 0x41e,
> + 0x43f, 0x2518, 0x250c, 0x2588, 0x2584, 0x41f, 0x44f, 0x2580,
> + 0x42f, 0x440, 0x420, 0x441, 0x421, 0x442, 0x422, 0x443,
> + 0x423, 0x436, 0x416, 0x432, 0x412, 0x44c, 0x42c, 0x2116,
> + 0xad, 0x44b, 0x42b, 0x437, 0x417, 0x448, 0x428, 0x44d,
> + 0x42d, 0x449, 0x429, 0x447, 0x427, 0xa7, 0x25a0, 0xa0 },
> + /* CP857 */
> + { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0xe0, 0xe5, 0xe7,
> + 0xea, 0xeb, 0xe8, 0xef, 0xee, 0x131, 0xc4, 0xc5,
> + 0xc9, 0xe6, 0xc6, 0xf4, 0xf6, 0xf2, 0xfb, 0xf9,
> + 0x130, 0xd6, 0xdc, 0xf8, 0xa3, 0xd8, 0x15e, 0x15f,
> + 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0x11e, 0x11f,
> + 0xbf, 0xae, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0xc1, 0xc2, 0xc0,
> + 0xa9, 0x2563, 0x2551, 0x2557, 0x255d, 0xa2, 0xa5, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0xe3, 0xc3,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
> + 0xba, 0xaa, 0xca, 0xcb, 0xc8, 0x0, 0xcd, 0xce,
> + 0xcf, 0x2518, 0x250c, 0x2588, 0x2584, 0xa6, 0xcc, 0x2580,
> + 0xd3, 0xdf, 0xd4, 0xd2, 0xf5, 0xd5, 0xb5, 0x0,
> + 0xd7, 0xda, 0xdb, 0xd9, 0xec, 0xff, 0xaf, 0xb4,
> + 0xad, 0xb1, 0x0, 0xbe, 0xb6, 0xa7, 0xf7, 0xb8,
> + 0xb0, 0xa8, 0xb7, 0xb9, 0xb3, 0xb2, 0x25a0, 0xa0 },
> + /* CP858 */
> + { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0xe0, 0xe5, 0xe7,
> + 0xea, 0xeb, 0xe8, 0xef, 0xee, 0xec, 0xc4, 0xc5,
> + 0xc9, 0xe6, 0xc6, 0xf4, 0xf6, 0xf2, 0xfb, 0xf9,
> + 0xff, 0xd6, 0xdc, 0xf8, 0xa3, 0xd8, 0xd7, 0x192,
> + 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0xaa, 0xba,
> + 0xbf, 0xae, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0xc1, 0xc2, 0xc0,
> + 0xa9, 0x2563, 0x2551, 0x2557, 0x255d, 0xa2, 0xa5, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0xe3, 0xc3,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
> + 0xf0, 0xd0, 0xca, 0xcb, 0xc8, 0x20ac, 0xcd, 0xce,
> + 0xcf, 0x2518, 0x250c, 0x2588, 0x2584, 0xa6, 0xcc, 0x2580,
> + 0xd3, 0xdf, 0xd4, 0xd2, 0xf5, 0xd5, 0xb5, 0xfe,
> + 0xde, 0xda, 0xdb, 0xd9, 0xfd, 0xdd, 0xaf, 0xb4,
> + 0xad, 0xb1, 0x2017, 0xbe, 0xb6, 0xa7, 0xf7, 0xb8,
> + 0xb0, 0xa8, 0xb7, 0xb9, 0xb3, 0xb2, 0x25a0, 0xa0 },
> + /* CP862 */
> + { 0x5d0, 0x5d1, 0x5d2, 0x5d3, 0x5d4, 0x5d5, 0x5d6, 0x5d7,
> + 0x5d8, 0x5d9, 0x5da, 0x5db, 0x5dc, 0x5dd, 0x5de, 0x5df,
> + 0x5e0, 0x5e1, 0x5e2, 0x5e3, 0x5e4, 0x5e5, 0x5e6, 0x5e7,
> + 0x5e8, 0x5e9, 0x5ea, 0xa2, 0xa3, 0xa5, 0x20a7, 0x192,
> + 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0xaa, 0xba,
> + 0xbf, 0x2310, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
> + 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
> + 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
> + 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
> + 0x3b1, 0xdf, 0x393, 0x3c0, 0x3a3, 0x3c3, 0xb5, 0x3c4,
> + 0x3a6, 0x398, 0x3a9, 0x3b4, 0x221e, 0x3c6, 0x3b5, 0x2229,
> + 0x2261, 0xb1, 0x2265, 0x2264, 0x2320, 0x2321, 0xf7, 0x2248,
> + 0xb0, 0x2219, 0xb7, 0x221a, 0x207f, 0xb2, 0x25a0, 0xa0 },
> + /* CP866 */
> + { 0x410, 0x411, 0x412, 0x413, 0x414, 0x415, 0x416, 0x417,
> + 0x418, 0x419, 0x41a, 0x41b, 0x41c, 0x41d, 0x41e, 0x41f,
> + 0x420, 0x421, 0x422, 0x423, 0x424, 0x425, 0x426, 0x427,
> + 0x428, 0x429, 0x42a, 0x42b, 0x42c, 0x42d, 0x42e, 0x42f,
> + 0x430, 0x431, 0x432, 0x433, 0x434, 0x435, 0x436, 0x437,
> + 0x438, 0x439, 0x43a, 0x43b, 0x43c, 0x43d, 0x43e, 0x43f,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
> + 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
> + 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
> + 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
> + 0x440, 0x441, 0x442, 0x443, 0x444, 0x445, 0x446, 0x447,
> + 0x448, 0x449, 0x44a, 0x44b, 0x44c, 0x44d, 0x44e, 0x44f,
> + 0x401, 0x451, 0x404, 0x454, 0x407, 0x457, 0x40e, 0x45e,
> + 0xb0, 0x2219, 0xb7, 0x221a, 0x2116, 0xa4, 0x25a0, 0xa0 },
> + /* CP874 */
> + { 0x20ac, 0x0, 0x0, 0x0, 0x0, 0x2026, 0x0, 0x0,
> + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> + 0xa0, 0xe01, 0xe02, 0xe03, 0xe04, 0xe05, 0xe06, 0xe07,
> + 0xe08, 0xe09, 0xe0a, 0xe0b, 0xe0c, 0xe0d, 0xe0e, 0xe0f,
> + 0xe10, 0xe11, 0xe12, 0xe13, 0xe14, 0xe15, 0xe16, 0xe17,
> + 0xe18, 0xe19, 0xe1a, 0xe1b, 0xe1c, 0xe1d, 0xe1e, 0xe1f,
> + 0xe20, 0xe21, 0xe22, 0xe23, 0xe24, 0xe25, 0xe26, 0xe27,
> + 0xe28, 0xe29, 0xe2a, 0xe2b, 0xe2c, 0xe2d, 0xe2e, 0xe2f,
> + 0xe30, 0xe31, 0xe32, 0xe33, 0xe34, 0xe35, 0xe36, 0xe37,
> + 0xe38, 0xe39, 0xe3a, 0x0, 0x0, 0x0, 0x0, 0xe3f,
> + 0xe40, 0xe41, 0xe42, 0xe43, 0xe44, 0xe45, 0xe46, 0xe47,
> + 0xe48, 0xe49, 0xe4a, 0xe4b, 0xe4c, 0xe4d, 0xe4e, 0xe4f,
> + 0xe50, 0xe51, 0xe52, 0xe53, 0xe54, 0xe55, 0xe56, 0xe57,
> + 0xe58, 0xe59, 0xe5a, 0xe5b, 0xfc, 0xfd, 0xfe, 0xff },
> + /* CP1125 */
> + { 0x410, 0x411, 0x412, 0x413, 0x414, 0x415, 0x416, 0x417,
> + 0x418, 0x419, 0x41a, 0x41b, 0x41c, 0x41d, 0x41e, 0x41f,
> + 0x420, 0x421, 0x422, 0x423, 0x424, 0x425, 0x426, 0x427,
> + 0x428, 0x429, 0x42a, 0x42b, 0x42c, 0x42d, 0x42e, 0x42f,
> + 0x430, 0x431, 0x432, 0x433, 0x434, 0x435, 0x436, 0x437,
> + 0x438, 0x439, 0x43a, 0x43b, 0x43c, 0x43d, 0x43e, 0x43f,
> + 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
> + 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
> + 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
> + 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
> + 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
> + 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
> + 0x440, 0x441, 0x442, 0x443, 0x444, 0x445, 0x446, 0x447,
> + 0x448, 0x449, 0x44a, 0x44b, 0x44c, 0x44d, 0x44e, 0x44f,
> + 0x401, 0x451, 0x490, 0x491, 0x404, 0x454, 0x406, 0x456,
> + 0x407, 0x457, 0xb7, 0x221a, 0x2116, 0xa4, 0x25a0, 0xa0 },
> + /* CP1250 */
> + { 0x20ac, 0x0, 0x201a, 0x0, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x0, 0x2030, 0x160, 0x2039, 0x15a, 0x164, 0x17d, 0x179,
> + 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x0, 0x2122, 0x161, 0x203a, 0x15b, 0x165, 0x17e, 0x17a,
> + 0xa0, 0x2c7, 0x2d8, 0x141, 0xa4, 0x104, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0x15e, 0xab, 0xac, 0xad, 0xae, 0x17b,
> + 0xb0, 0xb1, 0x2db, 0x142, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xb8, 0x105, 0x15f, 0xbb, 0x13d, 0x2dd, 0x13e, 0x17c,
> + 0x154, 0xc1, 0xc2, 0x102, 0xc4, 0x139, 0x106, 0xc7,
> + 0x10c, 0xc9, 0x118, 0xcb, 0x11a, 0xcd, 0xce, 0x10e,
> + 0x110, 0x143, 0x147, 0xd3, 0xd4, 0x150, 0xd6, 0xd7,
> + 0x158, 0x16e, 0xda, 0x170, 0xdc, 0xdd, 0x162, 0xdf,
> + 0x155, 0xe1, 0xe2, 0x103, 0xe4, 0x13a, 0x107, 0xe7,
> + 0x10d, 0xe9, 0x119, 0xeb, 0x11b, 0xed, 0xee, 0x10f,
> + 0x111, 0x144, 0x148, 0xf3, 0xf4, 0x151, 0xf6, 0xf7,
> + 0x159, 0x16f, 0xfa, 0x171, 0xfc, 0xfd, 0x163, 0x2d9 },
> + /* CP1251 */
> + { 0x402, 0x403, 0x201a, 0x453, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x20ac, 0x2030, 0x409, 0x2039, 0x40a, 0x40c, 0x40b, 0x40f,
> + 0x452, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x0, 0x2122, 0x459, 0x203a, 0x45a, 0x45c, 0x45b, 0x45f,
> + 0xa0, 0x40e, 0x45e, 0x408, 0xa4, 0x490, 0xa6, 0xa7,
> + 0x401, 0xa9, 0x404, 0xab, 0xac, 0xad, 0xae, 0x407,
> + 0xb0, 0xb1, 0x406, 0x456, 0x491, 0xb5, 0xb6, 0xb7,
> + 0x451, 0x2116, 0x454, 0xbb, 0x458, 0x405, 0x455, 0x457,
> + 0x410, 0x411, 0x412, 0x413, 0x414, 0x415, 0x416, 0x417,
> + 0x418, 0x419, 0x41a, 0x41b, 0x41c, 0x41d, 0x41e, 0x41f,
> + 0x420, 0x421, 0x422, 0x423, 0x424, 0x425, 0x426, 0x427,
> + 0x428, 0x429, 0x42a, 0x42b, 0x42c, 0x42d, 0x42e, 0x42f,
> + 0x430, 0x431, 0x432, 0x433, 0x434, 0x435, 0x436, 0x437,
> + 0x438, 0x439, 0x43a, 0x43b, 0x43c, 0x43d, 0x43e, 0x43f,
> + 0x440, 0x441, 0x442, 0x443, 0x444, 0x445, 0x446, 0x447,
> + 0x448, 0x449, 0x44a, 0x44b, 0x44c, 0x44d, 0x44e, 0x44f },
> + /* CP1252 */
> + { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x2c6, 0x2030, 0x160, 0x2039, 0x152, 0x0, 0x17d, 0x0,
> + 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x2dc, 0x2122, 0x161, 0x203a, 0x153, 0x0, 0x17e, 0x178,
> + 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
> + 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
> + 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
> + 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
> + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
> + 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
> + 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
> + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff },
> + /* CP1253 */
> + { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x0, 0x2030, 0x0, 0x2039, 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x0, 0x2122, 0x0, 0x203a, 0x0, 0x0, 0x0, 0x0,
> + 0xa0, 0x385, 0x386, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0x0, 0xab, 0xac, 0xad, 0xae, 0x2015,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0x384, 0xb5, 0xb6, 0xb7,
> + 0x388, 0x389, 0x38a, 0xbb, 0x38c, 0xbd, 0x38e, 0x38f,
> + 0x390, 0x391, 0x392, 0x393, 0x394, 0x395, 0x396, 0x397,
> + 0x398, 0x399, 0x39a, 0x39b, 0x39c, 0x39d, 0x39e, 0x39f,
> + 0x3a0, 0x3a1, 0x0, 0x3a3, 0x3a4, 0x3a5, 0x3a6, 0x3a7,
> + 0x3a8, 0x3a9, 0x3aa, 0x3ab, 0x3ac, 0x3ad, 0x3ae, 0x3af,
> + 0x3b0, 0x3b1, 0x3b2, 0x3b3, 0x3b4, 0x3b5, 0x3b6, 0x3b7,
> + 0x3b8, 0x3b9, 0x3ba, 0x3bb, 0x3bc, 0x3bd, 0x3be, 0x3bf,
> + 0x3c0, 0x3c1, 0x3c2, 0x3c3, 0x3c4, 0x3c5, 0x3c6, 0x3c7,
> + 0x3c8, 0x3c9, 0x3ca, 0x3cb, 0x3cc, 0x3cd, 0x3ce, 0xff },
> + /* CP1254 */
> + { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x2c6, 0x2030, 0x160, 0x2039, 0x152, 0x0, 0x0, 0x0,
> + 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x2dc, 0x2122, 0x161, 0x203a, 0x153, 0x0, 0x0, 0x178,
> + 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
> + 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
> + 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
> + 0x11e, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
> + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x130, 0x15e, 0xdf,
> + 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
> + 0x11f, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
> + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x131, 0x15f, 0xff },
> + /* CP1255 */
> + { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x2c6, 0x2030, 0x0, 0x2039, 0x0, 0x0, 0x0, 0x0,
> + 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x2dc, 0x2122, 0x0, 0x203a, 0x0, 0x0, 0x0, 0x0,
> + 0xa0, 0xa1, 0xa2, 0xa3, 0x20aa, 0xa5, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0xd7, 0xab, 0xac, 0xad, 0xae, 0xaf,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xb8, 0xb9, 0xf7, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
> + 0x5b0, 0x5b1, 0x5b2, 0x5b3, 0x5b4, 0x5b5, 0x5b6, 0x5b7,
> + 0x5b8, 0x5b9, 0x0, 0x5bb, 0x5bc, 0x5bd, 0x5be, 0x5bf,
> + 0x5c0, 0x5c1, 0x5c2, 0x5c3, 0x5f0, 0x5f1, 0x5f2, 0x5f3,
> + 0x5f4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> + 0x5d0, 0x5d1, 0x5d2, 0x5d3, 0x5d4, 0x5d5, 0x5d6, 0x5d7,
> + 0x5d8, 0x5d9, 0x5da, 0x5db, 0x5dc, 0x5dd, 0x5de, 0x5df,
> + 0x5e0, 0x5e1, 0x5e2, 0x5e3, 0x5e4, 0x5e5, 0x5e6, 0x5e7,
> + 0x5e8, 0x5e9, 0x5ea, 0x0, 0x0, 0x200e, 0x200f, 0xff },
> + /* CP1256 */
> + { 0x20ac, 0x67e, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x2c6, 0x2030, 0x679, 0x2039, 0x152, 0x686, 0x698, 0x688,
> + 0x6af, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x6a9, 0x2122, 0x691, 0x203a, 0x153, 0x200c, 0x200d, 0x6ba,
> + 0xa0, 0x60c, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0x6be, 0xab, 0xac, 0xad, 0xae, 0xaf,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xb8, 0xb9, 0x61b, 0xbb, 0xbc, 0xbd, 0xbe, 0x61f,
> + 0x6c1, 0x621, 0x622, 0x623, 0x624, 0x625, 0x626, 0x627,
> + 0x628, 0x629, 0x62a, 0x62b, 0x62c, 0x62d, 0x62e, 0x62f,
> + 0x630, 0x631, 0x632, 0x633, 0x634, 0x635, 0x636, 0xd7,
> + 0x637, 0x638, 0x639, 0x63a, 0x640, 0x641, 0x642, 0x643,
> + 0xe0, 0x644, 0xe2, 0x645, 0x646, 0x647, 0x648, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0x649, 0x64a, 0xee, 0xef,
> + 0x64b, 0x64c, 0x64d, 0x64e, 0xf4, 0x64f, 0x650, 0xf7,
> + 0x651, 0xf9, 0x652, 0xfb, 0xfc, 0x200e, 0x200f, 0x6d2 },
> + /* CP1257 */
> + { 0x20ac, 0x0, 0x201a, 0x0, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x0, 0x2030, 0x0, 0x2039, 0x0, 0xa8, 0x2c7, 0xb8,
> + 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x0, 0x2122, 0x0, 0x203a, 0x0, 0xaf, 0x2db, 0x0,
> + 0xa0, 0x0, 0xa2, 0xa3, 0xa4, 0x0, 0xa6, 0xa7,
> + 0xd8, 0xa9, 0x156, 0xab, 0xac, 0xad, 0xae, 0xc6,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xf8, 0xb9, 0x157, 0xbb, 0xbc, 0xbd, 0xbe, 0xe6,
> + 0x104, 0x12e, 0x100, 0x106, 0xc4, 0xc5, 0x118, 0x112,
> + 0x10c, 0xc9, 0x179, 0x116, 0x122, 0x136, 0x12a, 0x13b,
> + 0x160, 0x143, 0x145, 0xd3, 0x14c, 0xd5, 0xd6, 0xd7,
> + 0x172, 0x141, 0x15a, 0x16a, 0xdc, 0x17b, 0x17d, 0xdf,
> + 0x105, 0x12f, 0x101, 0x107, 0xe4, 0xe5, 0x119, 0x113,
> + 0x10d, 0xe9, 0x17a, 0x117, 0x123, 0x137, 0x12b, 0x13c,
> + 0x161, 0x144, 0x146, 0xf3, 0x14d, 0xf5, 0xf6, 0xf7,
> + 0x173, 0x142, 0x15b, 0x16b, 0xfc, 0x17c, 0x17e, 0x2d9 },
> + /* CP1258 */
> + { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
> + 0x2c6, 0x2030, 0x0, 0x2039, 0x152, 0x0, 0x0, 0x0,
> + 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
> + 0x2dc, 0x2122, 0x0, 0x203a, 0x153, 0x0, 0x0, 0x178,
> + 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
> + 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
> + 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
> + 0xc0, 0xc1, 0xc2, 0x102, 0xc4, 0xc5, 0xc6, 0xc7,
> + 0xc8, 0xc9, 0xca, 0xcb, 0x300, 0xcd, 0xce, 0xcf,
> + 0x110, 0xd1, 0x309, 0xd3, 0xd4, 0x1a0, 0xd6, 0xd7,
> + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x1af, 0x303, 0xdf,
> + 0xe0, 0xe1, 0xe2, 0x103, 0xe4, 0xe5, 0xe6, 0xe7,
> + 0xe8, 0xe9, 0xea, 0xeb, 0x301, 0xed, 0xee, 0xef,
> + 0x111, 0xf1, 0x323, 0xf3, 0xf4, 0x1a1, 0xf6, 0xf7,
> + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x1b0, 0x20ab, 0xff }
> +};
> +#endif /* _MB_EXTENDED_CHARSETS_DOS */
> +
> +/* Handle one to five decimal digits. Return -1 in any other case. */
> +static int
> +__micro_atoi (const char *s)
> +{
> + int ret = 0;
> +
> + if (!*s)
> + return -1;
> + while (*s)
> + {
> + if (*s < '0' || *s > '9' || ret >= 10000)
> + return -1;
> + ret = 10 * ret + (*s++ - '0');
> + }
> + return ret;
> +}
> +
> +#ifdef _MB_EXTENDED_CHARSETS_ISO
> +int
> +__iso_8859_index (const char *charset_ext)
> +{
> + int iso_idx = __micro_atoi (charset_ext);
> + if (iso_idx >= 2 && iso_idx <= 16)
> + {
> + iso_idx -= 2;
> + if (iso_idx > 10)
> + --iso_idx;
> + return iso_idx;
> + }
> + return -1;
> +}
> +#endif /* _MB_EXTENDED_CHARSETS_ISO */
> +
> +#ifdef _MB_EXTENDED_CHARSETS_DOS
> +int
> +__cp_index (const char *charset_ext)
> +{
> + int cp_idx = __micro_atoi (charset_ext);
> + switch (cp_idx)
> + {
> + case 437:
> + cp_idx = 0;
> + break;
> + case 720:
> + cp_idx = 1;
> + break;
> + case 737:
> + cp_idx = 2;
> + break;
> + case 775:
> + cp_idx = 3;
> + break;
> + case 850:
> + cp_idx = 4;
> + break;
> + case 852:
> + cp_idx = 5;
> + break;
> + case 855:
> + cp_idx = 6;
> + break;
> + case 857:
> + cp_idx = 7;
> + break;
> + case 858:
> + cp_idx = 8;
> + break;
> + case 862:
> + cp_idx = 9;
> + break;
> + case 866:
> + cp_idx = 10;
> + break;
> + case 874:
> + cp_idx = 11;
> + break;
> + case 1125:
> + cp_idx = 12;
> + break;
> + case 1250:
> + cp_idx = 13;
> + break;
> + case 1251:
> + cp_idx = 14;
> + break;
> + case 1252:
> + cp_idx = 15;
> + break;
> + case 1253:
> + cp_idx = 16;
> + break;
> + case 1254:
> + cp_idx = 17;
> + break;
> + case 1255:
> + cp_idx = 18;
> + break;
> + case 1256:
> + cp_idx = 19;
> + break;
> + case 1257:
> + cp_idx = 20;
> + break;
> + case 1258:
> + cp_idx = 21;
> + break;
> + default:
> + cp_idx = -1;
> + break;
> + }
> + return cp_idx;
> +}
> +#endif /* _MB_EXTENDED_CHARSETS_DOS */
> +#endif /* _MB_CAPABLE */
> Index: libc/stdlib/wctomb_r.c
> ===================================================================
> RCS file: /cvs/src/src/newlib/libc/stdlib/wctomb_r.c,v
> retrieving revision 1.12
> diff -u -p -r1.12 wctomb_r.c
> --- libc/stdlib/wctomb_r.c 19 Mar 2009 19:47:52 -0000 1.12
> +++ libc/stdlib/wctomb_r.c 22 Mar 2009 16:25:07 -0000
> @@ -4,209 +4,338 @@
> #include <wchar.h>
> #include <locale.h>
> #include "mbctype.h"
> +#include "local.h"
>
> -extern char *__locale_charset ();
> +int (*__wctomb) (struct _reent *, char *, wchar_t, const char *charset,
> + mbstate_t *)
> + = __ascii_wctomb;
>
> +#ifdef _MB_CAPABLE
> /* for some conversions, we use the __count field as a place to store a state value */
> #define __state __count
>
> int
> -_DEFUN (_wctomb_r, (r, s, wchar, state),
> - struct _reent *r _AND
> - char *s _AND
> - wchar_t _wchar _AND
> +_DEFUN (__utf8_wctomb, (r, s, wchar, charset, state),
> + struct _reent *r _AND
> + char *s _AND
> + wchar_t _wchar _AND
> + const char *charset _AND
> mbstate_t *state)
> {
> - /* Avoids compiler warnings about comparisons that are always false
> - due to limited range when sizeof(wchar_t) is 2 but sizeof(wint_t)
> - is 4, as is the case on cygwin. */
> wint_t wchar = _wchar;
>
> - if (strlen (__locale_charset ()) <= 1)
> - { /* fall-through */ }
> - else if (!strcmp (__locale_charset (), "UTF-8"))
> - {
> - if (s == NULL)
> - return 0; /* UTF-8 encoding is not state-dependent */
> + if (s == NULL)
> + return 0; /* UTF-8 encoding is not state-dependent */
>
> - if (state->__count == -4 && (wchar < 0xdc00 || wchar >= 0xdfff))
> + if (state->__count == -4 && (wchar < 0xdc00 || wchar >= 0xdfff))
> + {
> + /* At this point only the second half of a surrogate pair is valid. */
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + if (wchar <= 0x7f)
> + {
> + *s = wchar;
> + return 1;
> + }
> + if (wchar >= 0x80 && wchar <= 0x7ff)
> + {
> + *s++ = 0xc0 | ((wchar & 0x7c0) >> 6);
> + *s = 0x80 | (wchar & 0x3f);
> + return 2;
> + }
> + if (wchar >= 0x800 && wchar <= 0xffff)
> + {
> + if (wchar >= 0xd800 && wchar <= 0xdfff)
> {
> - /* At this point only the second half of a surrogate pair is valid. */
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - if (wchar <= 0x7f)
> - {
> - *s = wchar;
> - return 1;
> - }
> - else if (wchar >= 0x80 && wchar <= 0x7ff)
> - {
> - *s++ = 0xc0 | ((wchar & 0x7c0) >> 6);
> - *s = 0x80 | (wchar & 0x3f);
> - return 2;
> - }
> - else if (wchar >= 0x800 && wchar <= 0xffff)
> - {
> - if (wchar >= 0xd800 && wchar <= 0xdfff)
> + wint_t tmp;
> + /* UTF-16 surrogates -- must not occur in normal UCS-4 data */
> + if (sizeof (wchar_t) != 2)
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + if (wchar >= 0xdc00)
> {
> - wint_t tmp;
> - /* UTF-16 surrogates -- must not occur in normal UCS-4 data */
> - if (sizeof (wchar_t) != 2)
> + /* Second half of a surrogate pair. It's not valid if
> + we don't have already read a first half of a surrogate
> + before. */
> + if (state->__count != -4)
> {
> r->_errno = EILSEQ;
> return -1;
> }
> - if (wchar >= 0xdc00)
> - {
> - /* Second half of a surrogate pair. It's not valid if
> - we don't have already read a first half of a surrogate
> - before. */
> - if (state->__count != -4)
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - /* If it's valid, reconstruct the full Unicode value and
> - return the trailing three bytes of the UTF-8 char. */
> - tmp = (state->__value.__wchb[0] << 16)
> - | (state->__value.__wchb[1] << 8)
> - | (wchar & 0x3ff);
> - state->__count = 0;
> - *s++ = 0x80 | ((tmp & 0x3f000) >> 12);
> - *s++ = 0x80 | ((tmp & 0xfc0) >> 6);
> - *s = 0x80 | (tmp & 0x3f);
> - return 3;
> - }
> - /* First half of a surrogate pair. Store the state and return
> - the first byte of the UTF-8 char. */
> - tmp = ((wchar & 0x3ff) << 10) + 0x10000;
> - state->__value.__wchb[0] = (tmp >> 16) & 0xff;
> - state->__value.__wchb[1] = (tmp >> 8) & 0xff;
> - state->__count = -4;
> - *s = (0xf0 | ((tmp & 0x1c0000) >> 18));
> - return 1;
> + /* If it's valid, reconstruct the full Unicode value and
> + return the trailing three bytes of the UTF-8 char. */
> + tmp = (state->__value.__wchb[0] << 16)
> + | (state->__value.__wchb[1] << 8)
> + | (wchar & 0x3ff);
> + state->__count = 0;
> + *s++ = 0x80 | ((tmp & 0x3f000) >> 12);
> + *s++ = 0x80 | ((tmp & 0xfc0) >> 6);
> + *s = 0x80 | (tmp & 0x3f);
> + return 3;
> }
> - *s++ = 0xe0 | ((wchar & 0xf000) >> 12);
> - *s++ = 0x80 | ((wchar & 0xfc0) >> 6);
> - *s = 0x80 | (wchar & 0x3f);
> - return 3;
> - }
> - else if (wchar >= 0x10000 && wchar <= 0x10ffff)
> - {
> - *s++ = 0xf0 | ((wchar & 0x1c0000) >> 18);
> - *s++ = 0x80 | ((wchar & 0x3f000) >> 12);
> - *s++ = 0x80 | ((wchar & 0xfc0) >> 6);
> - *s = 0x80 | (wchar & 0x3f);
> - return 4;
> - }
> + /* First half of a surrogate pair. Store the state and return
> + the first byte of the UTF-8 char. */
> + tmp = ((wchar & 0x3ff) << 10) + 0x10000;
> + state->__value.__wchb[0] = (tmp >> 16) & 0xff;
> + state->__value.__wchb[1] = (tmp >> 8) & 0xff;
> + state->__count = -4;
> + *s = (0xf0 | ((tmp & 0x1c0000) >> 18));
> + return 1;
> + }
> + *s++ = 0xe0 | ((wchar & 0xf000) >> 12);
> + *s++ = 0x80 | ((wchar & 0xfc0) >> 6);
> + *s = 0x80 | (wchar & 0x3f);
> + return 3;
> + }
> + if (wchar >= 0x10000 && wchar <= 0x10ffff)
> + {
> + *s++ = 0xf0 | ((wchar & 0x1c0000) >> 18);
> + *s++ = 0x80 | ((wchar & 0x3f000) >> 12);
> + *s++ = 0x80 | ((wchar & 0xfc0) >> 6);
> + *s = 0x80 | (wchar & 0x3f);
> + return 4;
> + }
> +
> + r->_errno = EILSEQ;
> + return -1;
> +}
> +
> +/* Cygwin defines its own doublebyte charset conversion functions
> + because the underlying OS requires wchar_t == UTF-16. */
> +#ifndef __CYGWIN__
> +int
> +_DEFUN (__sjis_wctomb, (r, s, wchar, charset, state),
> + struct _reent *r _AND
> + char *s _AND
> + wchar_t _wchar _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wint_t wchar = _wchar;
> +
> + unsigned char char2 = (unsigned char)wchar;
> + unsigned char char1 = (unsigned char)(wchar >> 8);
> +
> + if (s == NULL)
> + return 0; /* not state-dependent */
> +
> + if (char1 != 0x00)
> + {
> + /* first byte is non-zero..validate multi-byte char */
> + if (_issjis1(char1) && _issjis2(char2))
> + {
> + *s++ = (char)char1;
> + *s = (char)char2;
> + return 2;
> + }
> else
> {
> r->_errno = EILSEQ;
> return -1;
> }
> }
> - else if (!strcmp (__locale_charset (), "SJIS"))
> + *s = (char) wchar;
> + return 1;
> +}
> +
> +int
> +_DEFUN (__eucjp_wctomb, (r, s, wchar, charset, state),
> + struct _reent *r _AND
> + char *s _AND
> + wchar_t _wchar _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wint_t wchar = _wchar;
> + unsigned char char2 = (unsigned char)wchar;
> + unsigned char char1 = (unsigned char)(wchar >> 8);
> +
> + if (s == NULL)
> + return 0; /* not state-dependent */
> +
> + if (char1 != 0x00)
> {
> - unsigned char char2 = (unsigned char)wchar;
> - unsigned char char1 = (unsigned char)(wchar >> 8);
> + /* first byte is non-zero..validate multi-byte char */
> + if (_iseucjp (char1) && _iseucjp (char2))
> + {
> + *s++ = (char)char1;
> + *s = (char)char2;
> + return 2;
> + }
> + else
> + {
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + }
> + *s = (char) wchar;
> + return 1;
> +}
>
> - if (s == NULL)
> - return 0; /* not state-dependent */
> +int
> +_DEFUN (__jis_wctomb, (r, s, wchar, charset, state),
> + struct _reent *r _AND
> + char *s _AND
> + wchar_t _wchar _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wint_t wchar = _wchar;
> + int cnt = 0;
> + unsigned char char2 = (unsigned char)wchar;
> + unsigned char char1 = (unsigned char)(wchar >> 8);
>
> - if (char1 != 0x00)
> - {
> - /* first byte is non-zero..validate multi-byte char */
> - if (_issjis1(char1) && _issjis2(char2))
> - {
> - *s++ = (char)char1;
> - *s = (char)char2;
> - return 2;
> - }
> - else
> + if (s == NULL)
> + return 1; /* state-dependent */
> +
> + if (char1 != 0x00)
> + {
> + /* first byte is non-zero..validate multi-byte char */
> + if (_isjis (char1) && _isjis (char2))
> + {
> + if (state->__state == 0)
> {
> - r->_errno = EILSEQ;
> - return -1;
> + /* must switch from ASCII to JIS state */
> + state->__state = 1;
> + *s++ = ESC_CHAR;
> + *s++ = '$';
> + *s++ = 'B';
> + cnt = 3;
> }
> - }
> + *s++ = (char)char1;
> + *s = (char)char2;
> + return cnt + 2;
> + }
> + r->_errno = EILSEQ;
> + return -1;
> }
> - else if (!strcmp (__locale_charset (), "EUCJP"))
> + if (state->__state != 0)
> {
> - unsigned char char2 = (unsigned char)wchar;
> - unsigned char char1 = (unsigned char)(wchar >> 8);
> + /* must switch from JIS to ASCII state */
> + state->__state = 0;
> + *s++ = ESC_CHAR;
> + *s++ = '(';
> + *s++ = 'B';
> + cnt = 3;
> + }
> + *s = (char)char2;
> + return cnt + 1;
> +}
> +#endif /* !__CYGWIN__ */
>
> - if (s == NULL)
> - return 0; /* not state-dependent */
> -
> - if (char1 != 0x00)
> - {
> - /* first byte is non-zero..validate multi-byte char */
> - if (_iseucjp (char1) && _iseucjp (char2))
> - {
> - *s++ = (char)char1;
> - *s = (char)char2;
> - return 2;
> - }
> - else
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - }
> +#ifdef _MB_EXTENDED_CHARSETS_ISO
> +int
> +_DEFUN (__iso_wctomb, (r, s, wchar, charset, state),
> + struct _reent *r _AND
> + char *s _AND
> + wchar_t _wchar _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wint_t wchar = _wchar;
> +
> + if (s == NULL)
> + return 0;
> +
> + /* wchars <= 0x9f translate to all ISO charsets directly. */
> + if (wchar >= 0xa0)
> + {
> + int iso_idx = __iso_8859_index (charset + 9);
> + if (iso_idx >= 0)
> + {
> + unsigned char mb;
> +
> + if (s == NULL)
> + return 0;
> +
> + for (mb = 0; mb < 0x60; ++mb)
> + if (__iso_8859_conv[iso_idx][mb] == wchar)
> + {
> + *s = (char) (mb + 0xa0);
> + return 1;
> + }
> + r->_errno = EILSEQ;
> + return -1;
> + }
> }
> - else if (!strcmp (__locale_charset (), "JIS"))
> +
> + if ((size_t)wchar >= 0x100)
> {
> - int cnt = 0;
> - unsigned char char2 = (unsigned char)wchar;
> - unsigned char char1 = (unsigned char)(wchar >> 8);
> -
> - if (s == NULL)
> - return 1; /* state-dependent */
> -
> - if (char1 != 0x00)
> - {
> - /* first byte is non-zero..validate multi-byte char */
> - if (_isjis (char1) && _isjis (char2))
> - {
> - if (state->__state == 0)
> - {
> - /* must switch from ASCII to JIS state */
> - state->__state = 1;
> - *s++ = ESC_CHAR;
> - *s++ = '$';
> - *s++ = 'B';
> - cnt = 3;
> - }
> - *s++ = (char)char1;
> - *s = (char)char2;
> - return cnt + 2;
> - }
> - else
> - {
> - r->_errno = EILSEQ;
> - return -1;
> - }
> - }
> - else
> - {
> - if (state->__state != 0)
> - {
> - /* must switch from JIS to ASCII state */
> - state->__state = 0;
> - *s++ = ESC_CHAR;
> - *s++ = '(';
> - *s++ = 'B';
> - cnt = 3;
> - }
> - *s = (char)char2;
> - return cnt + 1;
> - }
> + r->_errno = EILSEQ;
> + return -1;
> + }
> +
> + *s = (char) wchar;
> + return 1;
> +}
> +#endif /* _MB_EXTENDED_CHARSETS_ISO */
> +
> +#ifdef _MB_EXTENDED_CHARSETS_DOS
> +int
> +_DEFUN (__cp_wctomb, (r, s, wchar, charset, state),
> + struct _reent *r _AND
> + char *s _AND
> + wchar_t _wchar _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + wint_t wchar = _wchar;
> +
> + if (s == NULL)
> + return 0;
> +
> + if (wchar >= 0x80)
> + {
> + int cp_idx = __cp_index (charset + 2);
> + if (cp_idx >= 0)
> + {
> + unsigned char mb;
> +
> + if (s == NULL)
> + return 0;
> +
> + for (mb = 0; mb < 0x80; ++mb)
> + if (__cp_conv[cp_idx][mb] == wchar)
> + {
> + *s = (char) (mb + 0x80);
> + return 1;
> + }
> + r->_errno = EILSEQ;
> + return -1;
> + }
> + }
> +
> + if ((size_t)wchar >= 0x100)
> + {
> + r->_errno = EILSEQ;
> + return -1;
> }
>
> + *s = (char) wchar;
> + return 1;
> +}
> +#endif /* _MB_EXTENDED_CHARSETS_DOS */
> +#endif /* _MB_CAPABLE */
> +
> +int
> +_DEFUN (__ascii_wctomb, (r, s, wchar, charset, state),
> + struct _reent *r _AND
> + char *s _AND
> + wchar_t _wchar _AND
> + const char *charset _AND
> + mbstate_t *state)
> +{
> + /* Avoids compiler warnings about comparisons that are always false
> + due to limited range when sizeof(wchar_t) is 2 but sizeof(wint_t)
> + is 4, as is the case on cygwin. */
> + wint_t wchar = _wchar;
> +
> if (s == NULL)
> return 0;
>
> - /* otherwise we are dealing with a single byte character */
> if ((size_t)wchar >= 0x100)
> {
> r->_errno = EILSEQ;
> @@ -216,4 +345,13 @@ _DEFUN (_wctomb_r, (r, s, wchar, state),
> *s = (char) wchar;
> return 1;
> }
> -
> +
> +int
> +_DEFUN (_wctomb_r, (r, s, wchar, state),
> + struct _reent *r _AND
> + char *s _AND
> + wchar_t _wchar _AND
> + mbstate_t *state)
> +{
> + return __wctomb (r, s, _wchar, __locale_charset (), state);
> +}
>
>
>
More information about the Newlib
mailing list