UTF8 support in Cygwin

Kazuhiro Fujieda fujieda@jaist.ac.jp
Wed Jul 3 05:57:00 GMT 2002

>>> On Wed, 3 Jul 2002 11:07:15 +0100
>>> "Chris January" <chris@atomice.net> said:

> My question is, does anyone have any objections to doing things this way,
> and if so, can they suggest a better way? I don't want to patch the whole of
> Cygwin and then have to re-write everything at a later date.

I'd like to propose supporting other codepages than UTF8 and
making it connected with other portions than filenames.

For example, in case of CYGWIN=codepage:20866, suppose
the `parse_options' set current_codepage = other_cp and
current_cpnum = (UINT)20866.
Your example would become as follows.

  if (current_codepage == other_cp)
      WCHAR wbuf[MAX_PATH];
      if (MultiByteToWideChar (current_cpnum, 0, get_win32_name(), -1,
                               wbuf, MAX_PATH) == 0)
          __seterrno ();
          goto done;
      x = CreateFileW (wbuf, access, shared, &sa, creation_distribution,
                       file_attributes, 0);
    x = CreateFileA (get_win32_name (), access, shared, &sa, creation_distribution,
      file_attributes, 0);

Moreover, get_cp in miscfunc.cc would have to become as follows.

    get_cp ()
      switch (current_codepage)
        case ansi_cp:
          return GetACP();
        case oem_cp:
          return GetOEMCP();
        case other_cp:
          return current_cpnum;

When we want to use UTF8, we set codepage:65001 or codepage:utf8.
The latter case needs for the parser to accept "utf8" and
translate it to CP_UTF8 (65001).

How about this idea?
  | AIST      Kazuhiro Fujieda <fujieda@jaist.ac.jp>
  | HOKURIKU  Center for Information Science
o_/ 1990      Japan Advanced Institute of Science and Technology

More information about the Cygwin-developers mailing list