This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[RFC-v5] Handle cygwin wchar_t specifics

From: "Pierre Muller" <pierre dot muller at ics-cnrs dot unistra dot fr>
To: "'Tom Tromey'" <tromey at redhat dot com>
Cc: <gdb-patches at sourceware dot org>
Date: Tue, 19 Apr 2011 15:56:14 +0200
Subject: [RFC-v5] Handle cygwin wchar_t specifics
References: <5928.31498147479$1302882967@news.gmane.org> <m3ei53cres.fsf@fleche.redhat.com> <005101cbfc50$193136b0$4b93a410$%muller@ics-cnrs.unistra.fr> <20110416162455.GA5599@host1.jankratochvil.net> <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr> <83zknpoacd.fsf@gnu.org> <21014.6501930014$1303139687@news.gmane.org> <m3zknn7a2v.fsf@fleche.redhat.com> <34716.7311156683$1303204711@news.gmane.org> <m3fwpe5qg1.fsf@fleche.redhat.com>


> -----Message d'origine-----
> De?: gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé?: mardi 19 avril 2011 15:19
> À?: Pierre Muller
> Cc?: gdb-patches@sourceware.org
> Objet?: Re: [RFC-v4] Handle cygwin wchar_t specifics
> 
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
> 
> Pierre> 1) we should really use "gdb_wchar_t" type, not "wchar_t"
> 
> Yeah.
> 
> Pierre> 2) If sizeof(gdb_wchar_t) == 1
> Pierre> I don't think that UTF-8LE and UTF-8BE exist, do they?
> Pierre> At least they are not in the iconv -l list for current cygwin.
> 
> A platform where this is true should not define __STDC_ISO_10646__.
> You might as well just assert that the size is 2 or 4.
> 
> Pierre> 3) WORD_BIGENDIAN is not defined at all on Cygwin,
> Pierre> so that your code would probably not compile.
> 
> Yeah, I forgot, you need #if.  See config.in.
> 
> Pierre> A further question is whether UTF-32 is always supported...
> 
> If someone can find a platform where wchar_t is 4 bytes, where
> __STDC_ISO_10646__ is defined, and where UTF-32 is not understood, then
> we can complain bitterly and change the code again.
> 
> Pierre> Below is yet another proposal:
> Pierre> it transforms INTERMEDIATE_ENCODING macro into a call to
> Pierre> intermediate_encoding function.
> 
> I'd prefer it if the new code is only used in the __STDC_ISO_10646__
> case.
 Done below. 
> Pierre> +#ifdef WORDS_BIGENDIAN
> 
> #if
OK, corrected below.
> Pierre> +const char *
> Pierre> +intermediate_encoding (void)
> 
> New functions require an introductory comment.
  I wrote a minimal description, feel free to improve it.
 
> Pierre>  #ifdef PHONY_ICONV
> Pierre> -#define INTERMEDIATE_ENCODING "wchar_t"
> Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
> 
> I don't think DEFAULT_INTERMEDIATE_ENCODING is needed.

  I assumed you ment: not necessary if PHONY_ICONV is defined,
and this is what I changed below.
(I would personally have favored to completely remove
INTERMEDIATE_ENCODING macro and call the function directly.)
 
> Tom

  Thanks for your comments,
I tried to take all into account in the new version
below.

  Checked on cygwin (where __STDC_ISO_10646__ is defined), 
mingw32  (not defined) and mingw64 (no iconv at all,
and consequently no intermediate_encoding function).
All three allow at least printing out of version correctly.

  More comments?

Pierre


2011-04-19  Pierre Muller  <muller@ics.u-strasbg.fr>

	* gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro.
	(INTERMEDIATE_ENCODING): Change value to intermediate_encoding
	function call.
	(intermediate_encoding): New prototype.
	* charset.c (ENDIAN_SUFFIX): New macro.
	(intermediate_encoding): New function.
	
Index: charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- charset.c	11 Jan 2011 15:10:01 -0000	1.43
+++ charset.c	19 Apr 2011 13:42:54 -0000
@@ -922,6 +922,59 @@ default_auto_wide_charset (void)
   return GDB_DEFAULT_TARGET_WIDE_CHARSET;
 }
 
+
+#ifndef PHONY_ICONV
+/* Macro used for UTF or UCS endianness suffix.  */
+#if WORDS_BIGENDIAN
+#define ENDIAN_SUFFIX "BE"
+#else
+#define ENDIAN_SUFFIX "LE"
+#endif
+
+/* intermediate_encoding returns the charset unsed internally by
+   GDB to convert between target and host encodings.  */
+
+const char *
+intermediate_encoding (void)
+{
+#ifdef __STDC_ISO_10646__
+  if (sizeof (gdb_wchar_t) == 2)
+    {
+      static const char *stored_result = NULL;
+      const char *result;
+      int i;
+
+      if (stored_result)
+	return stored_result;
+      result = "UTF-16" ENDIAN_SUFFIX;
+      /* Check that the name is in the list of handled charsets.  */
+      for (i = 0; charset_enum[i]; i++)
+	{
+	  if (strcmp (result, charset_enum[i]) == 0)
+	    {
+	      stored_result = result;
+	      return result;
+	    }
+	}
+      /* Second try, with UCS-2 type.  */
+      result = "UCS-2" ENDIAN_SUFFIX;
+      /* Check that the name is in the list of handled charsets.  */
+      for (i = 0; charset_enum[i]; i++)
+	{
+	  if (strcmp (result, charset_enum[i]) == 0)
+	    {
+	      stored_result = result;
+	      return result;
+	    }
+	}
+    }
+#endif /* __STDC_ISO_10646__ */
+  /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are
+     not known, use DEFAULT_INTERMEDIATE_ENCODING macro.  */
+  return DEFAULT_INTERMEDIATE_ENCODING;
+}
+#endif /* not PHONY_ICONV */
+
 void
 _initialize_charset (void)
 {
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	19 Apr 2011 13:42:54 -0000
@@ -79,18 +79,20 @@ typedef wint_t gdb_wint_t;
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
 #if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE"
 #else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE"
 #endif
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
-#define INTERMEDIATE_ENCODING "wchar_t"
+#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
 #else
 /* This shouldn't happen, because the earlier #if should have filtered
    out this case.  */
 #error "Neither __STDC_ISO_10646__ nor _LIBICONV_VERSION defined"
 #endif
 
+#define INTERMEDIATE_ENCODING intermediate_encoding ()
+
 #else
 
 /* If we got here and have wchar_t support, we might be on a system
@@ -117,9 +119,13 @@ typedef int gdb_wint_t;
 #ifdef PHONY_ICONV
 #define INTERMEDIATE_ENCODING "wchar_t"
 #else
-#define INTERMEDIATE_ENCODING host_charset ()
+#define DEFAULT_INTERMEDIATE_ENCODING host_charset ()
+#endif
+
 #endif
 
+#ifndef PHONY_ICONV
+const char *intermediate_encoding (void);
 #endif
 
 #endif /* GDB_WCHAR_H */

References:
- Re: [RFA] Handle cygwin wchar_t specifics
  - From: Tom Tromey
- Re: [RFA] Handle cygwin wchar_t specifics
  - From: Jan Kratochvil
- Re: [RFA-v2] Handle cygwin wchar_t specifics
  - From: Eli Zaretskii
- Re: [RFA-v3] Handle cygwin wchar_t specifics
  - From: Tom Tromey
- Re: [RFC-v4] Handle cygwin wchar_t specifics
  - From: Tom Tromey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]