This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
[RFC-v4] Handle cygwin wchar_t specifics
> -----Message d'origine-----
> De?: gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé?: lundi 18 avril 2011 19:18
> À?: Pierre Muller
> Cc?: 'Eli Zaretskii'; jan.kratochvil@redhat.com;
gdb-patches@sourceware.org
> Objet?: Re: [RFA-v3] Handle cygwin wchar_t specifics
>
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
>
> Pierre> This patch also changes the intermediate_encoding for mingw
hosts,
> Pierre> from "wchar_t" to "UTF-16LE", but this seems to work nicely
> Pierre> for both mingw32 and mingw64 (and only if iconv is found,
> Pierre> otherwise gdb_wchar_t is simply char and phony functions are
used).
>
> Pierre> -#define INTERMEDIATE_ENCODING host_charset ()
> Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING host_charset ()
>
> This changes the behavior if the gdb user changes the host encoding.
> This is an unusual situation, admittedly, but it seems to me that it is
> just as easy to only introduce the `intermediate_encoding' global in the
> UTF-{16,32} case.
>
> Pierre> + intermediate_encoding = DEFAULT_INTERMEDIATE_ENCODING;
> Pierre> +# if defined (USE_WIN32API) || defined (__CYGWIN__)
> Pierre> + if (sizeof (gdb_wchar_t) == 2)
> Pierre> + intermediate_encoding = "UTF-16LE";
> Pierre> +# endif
>
> Here, instead of a special case for __CYGWIN__, and instead of
> hard-coding the endian-ness, just use the same code for all
> __STDC_ISO_10646__ platforms. Maybe something like:
>
> intermediate_encoding = xstrprintf ("UTF-%d%s", 8 * sizeof (wchar_t),
> WORDS_BIGENDIAN ? "BE" : "LE");
Three problems here:
1) we should really use "gdb_wchar_t" type, not "wchar_t"
2) If sizeof(gdb_wchar_t) == 1
I don't think that UTF-8LE and UTF-8BE exist, do they?
At least they are not in the iconv -l list for current cygwin.
3) WORD_BIGENDIAN is not defined at all on Cygwin,
so that your code would probably not compile.
A further question is whether UTF-32 is always supported...
Below is yet another proposal:
it transforms INTERMEDIATE_ENCODING macro into a call to
intermediate_encoding function.
This functions handles especially the case when gdb_wchar_t is 2 byte long,
by trying UTF-16XE (with X equal L or B), and if this one is not
in the list of supported charsets, tries UCS-2XE.
As there is apparently no advantage of using UTF-32 over UCS-4 (according
to Eli)
I did not extend the change to the 4 byte case.
Comments welcome,
Pierre Muller
2011-04-19 Pierre Muller <muller@ics.u-strasbg.fr>
* gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro.
(INTERMEDIATE_ENCODING): Change value to intermediate_encoding
function call.
(intermediate_encoding): New prototype.
* charset.c (intermediate_encoding): New function.
Index: charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- charset.c 11 Jan 2011 15:10:01 -0000 1.43
+++ charset.c 19 Apr 2011 09:05:43 -0000
@@ -922,6 +922,50 @@ default_auto_wide_charset (void)
return GDB_DEFAULT_TARGET_WIDE_CHARSET;
}
+#ifdef WORDS_BIGENDIAN
+#define ENDIAN_SUFFIX "BE"
+#else
+#define ENDIAN_SUFFIX "LE"
+#endif
+
+const char *
+intermediate_encoding (void)
+{
+ if (sizeof (gdb_wchar_t) == 2)
+ {
+ static const char *stored_result = NULL;
+ const char *result;
+ int i;
+
+ if (stored_result)
+ return stored_result;
+ result = "UTF-16" ENDIAN_SUFFIX;
+ /* Check that the name is in the list of handled charsets. */
+ for (i = 0; charset_enum[i]; i++)
+ {
+ if (strcmp (result, charset_enum[i]) == 0)
+ {
+ stored_result = result;
+ return result;
+ }
+ }
+ /* Second try, with UCS-2 type. */
+ result = "UCS-2" ENDIAN_SUFFIX;
+ /* Check that the name is in the list of handled charsets. */
+ for (i = 0; charset_enum[i]; i++)
+ {
+ if (strcmp (result, charset_enum[i]) == 0)
+ {
+ stored_result = result;
+ return result;
+ }
+ }
+ }
+ /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are
+ not known, use DEFAULT_INTERMEDIATE_ENCODING macro. */
+ return DEFAULT_INTERMEDIATE_ENCODING;
+}
+
void
_initialize_charset (void)
{
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6
+++ gdb_wchar.h 19 Apr 2011 09:05:43 -0000
@@ -79,12 +79,12 @@ typedef wint_t gdb_wint_t;
hosts that emit a BOM when the unadorned name is used. */
#if defined (__STDC_ISO_10646__)
#if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE"
#else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE"
#endif
#elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
-#define INTERMEDIATE_ENCODING "wchar_t"
+#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
#else
/* This shouldn't happen, because the earlier #if should have filtered
out this case. */
@@ -115,11 +115,14 @@ typedef int gdb_wint_t;
also providing a phony iconv, we might as well just stick with
"wchar_t". */
#ifdef PHONY_ICONV
-#define INTERMEDIATE_ENCODING "wchar_t"
+#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
#else
-#define INTERMEDIATE_ENCODING host_charset ()
+#define DEFAULT_INTERMEDIATE_ENCODING host_charset ()
#endif
#endif
+#define INTERMEDIATE_ENCODING intermediate_encoding ()
+const char *intermediate_encoding (void);
+
#endif /* GDB_WCHAR_H */