This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Default target wide character set


>>>>> "Alexey" == Alexey Feldgendler <alexeyf@opera.com> writes:

Alexey> I got assigned part-time to contribute to gdb, mostly by fixing
Alexey> bugs that affect us, but also to implement new features.

Welcome to GDB.

I don't know your copyright assignment situation, but if you are
planning to submit patches, it doesn't hurt to get started on that
early.  Send me email off-list if you want to do this.

Alexey> A. Have the default target wide character set depend on the size of
Alexey> the type named wchar_t.

Alexey> Side question: how does gdb figure out sizeof(wchar_t)? Does it come
Alexey> from the symbol table or from elsewhere?

Yeah, look in c-lang.c for a call to lookup_typename with an argument of
"wchar_t".  The resulting type can be queried for its attributes.

Alexey> B. Have charset_for_string_type() check after calling
Alexey> target_wide_charset() whether the width of the returned character set
Alexey> matches the width of the actual string type, and use fallback similar
Alexey> to  what's done for C_STRING_16 and C_STRING_32 if it doesn't. 

Alexey> What do you think of options A and B? Or is there maybe another
Alexey> possiblity that I'm overlooking?

Yeah, I think there is another solution.  It is pretty similar to these,
though.

The general problem is that the relevant standards put very few
constraints on wchar_t.  There is no guarantee that they use any form of
UCS -- and there are systems which in fact do not.

Therefore, if the user picks some random target wide charset, I think we
ought to honor his request.

Another wrinkle is that there are no good ways to determine any
characteristics of character sets.  This simply isn't part of any
standardized API (we could of course implement our own database for
this... but I was not motivated to do so).  What this means is that we
can also do very little error checking in practice -- if the target uses
UCS-4, but the user says "set target-wide-charset SJIS", well, he will
get nonsense in response, with no warning from GDB.

What I would propose doing is adding a new charset named "UCS".  If this
is selected as the target wide charset, then we would automatically pick
UCS-2 or UCS-4 depending on sizeof(target wchar_t).  This would probably
mean having a few special cases in the code (like we do for the -BE and
-LE variants).  We would then make this the default target wide charset.

What do you think of that?

Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]