Adding mbstate_t, mbsinit(), mbrtowc(), mbrlen() etc.

Kazuhiro Fujieda fujieda@jaist.ac.jp
Thu Aug 22 18:52:00 GMT 2002


>>> On Thu, 22 Aug 2002 21:50:50 +0400
>>> egor duda <deo@logos-m.ru> said:

>   I'm preparing a patch to add restartable versions of multibyte
> conversion functions to newlib. As long as all state information is
> already handled by *_r() versions, this functions are just simple
> wrappers around foo() of foo_r() functions, depending on MB_CAPABLE.

The approach wrapping mb*_r() in mbr*() can't realize the
behavior standardized in C99 (or C90 Amendment1).

The `mbrtowc()' is required to accept incomplete multibyte
characters and store its state indicating such incompleteness
for successive conversions, while mbtowc_r() can't accept
incomplete multibyte characters.

We have to rewrite the MB_CAPABLE version of mbtowc_r() to
realize this behavior.

> mbstate_t as struct { int; union { wchar_t; char[4] }}, while
> Microsoft's C runtime defines it as int. Would 'int' be enough for
> everything? 

No, int may be enough but inconvenient to represent the state
indicating incomplete multibyte characters in JIS or UTF8 encoding
(the current mbtowc_r() support these encoding).
The mbstate_t needs to represent the conversion state itself and
any incomplete sequence in these encodings.
____
  | AIST      Kazuhiro Fujieda <fujieda@jaist.ac.jp>
  | HOKURIKU  Center for Information Science
o_/ 1990      Japan Advanced Institute of Science and Technology



More information about the Newlib mailing list