"C" character set (again)

Andy Koppe andy.koppe@gmail.com
Sat Jan 16 16:40:00 GMT 2010


2010/1/15 Corinna Vinschen:
> Can you please review the below patch to the docs?  I would like to
> make absolutely sure that the description is comprehensive.

Here's a revised patch. I think the potentially confusing bit about
"C" as the default locale can just go since it essentially just
repeats earlier info. Also, I tried to separate the discussions of
what happens in a non-locale aware app vs. what happens when the
environment specifies "C" or another ASCII locale.

HTH,
Andy


Index: setup2.sgml
===================================================================
RCS file: /cvs/src/src/winsup/doc/setup2.sgml,v
retrieving revision 1.31
diff -u -r1.31 setup2.sgml
--- setup2.sgml	2 Dec 2009 09:36:54 -0000	1.31
+++ setup2.sgml	16 Jan 2010 16:28:21 -0000
@@ -201,17 +201,18 @@

 <para>
 At application startup, the application's locale is set to the default
-"C" or "POSIX" locale.  Under Cygwin, this locale defaults to the UTF-8
-character set.  If you want to stick to the "C" locale and only change to
-another charset, you can define this by setting one of the locale environment
-variables to "C.charset".  For instance</para>
+"C" or "POSIX" locale.  Under Cygwin 1.7.2 and later, this locale defaults
+to the ASCII character set on the application level.  If you want to stick
+to the "C" locale and only change to another charset, you can define this
+by setting one of the locale environment variables to "C.charset".  For
+instance</para>

 <screen>
   "C.ISO-8859-1"
 </screen>

-<para>The default locale in the absence of the aforementioned locale
-environment variables is "C.UTF-8".</para>
+<note><para>The default locale in the absence of the aforementioned locale
+environment variables is "C.UTF-8".</para></note>

 <para>Windows uses the UTF-16 charset exclusively to store the names
 of any object used by the Operating System.  This is especially important
@@ -232,8 +233,8 @@
 However, even if one of the locale environment variables is set to
 some other value than "C", this does <emphasis>only</emphasis> affect
 how Cygwin itself converts filenames.  As the POSIX standard requires,
-it's the applications responsibility to activate that locale for its
-own purpose, typically by using the call</para>
+it's the application's responsibility to activate that locale for its
+own purposes, typically by using the call</para>

 <screen>
   setlocale (LC_ALL, "");
@@ -244,6 +245,18 @@
 of the important locale variables set in the environment, the locale
 is set to the default locale, which is "C.UTF-8".</para>

+<para>But what about applications which are not locale-aware?  Per POSIX,
+they are running in the "C" or "POSIX" locale, which implies the ASCII
+charset.  The Cygwin DLL itself, however, will nevertheless use the locale
+set in the environment (or the "C.UTF-8" default locale) for converting
+filenames etc.</para>
+
+<para>When the locale set in the environment specifies an ASCII charset,
+for example "C" or "en_US.ASCII", Cygwin will still use UTF-8
+under the hood to translate filenames.  This allows for easier
+interoperability with applications running in the default "C.UTF-8" locale.
+</para>
+
 <para>
 Right now the language and territory, as well as the modifier, are not
 important to Cygwin, except to fix a single problem.  There's a class of
@@ -275,11 +288,6 @@
 <itemizedlist mark="bullet">

 <listitem><para>
-The default locale is the "C" or "POSIX" locale.  Under Cygwin this locale
-defaults to the UTF-8 character set.</para>
-</listitem>
-
-<listitem><para>
 Assume that you've set one of the aforementioned environment variables to some
 valid POSIX locale value, other than "C" and "POSIX".  Assume further that
 you're living in Japan.  You might want to use the language code "ja" and the



More information about the Cygwin-developers mailing list